Open mitar opened 5 years ago
I'd love to see this happen, but it seems like a lot of work due to having to query the system catalog (at what point do you do this? do you do it lazily or eagerly? do you do it automatically or require manual registration?), cache that information, and ensure the cache doesn't become stale if things change. Related discussion for a .NET driver that implemented it: npgsql/npgsql#441
It's a shame the nested type information isn't provided by the protocol itself...
Edit: here's a link to the relevant queries/business logic in that npgsql library, if it helps:
Another edit: this is how it seems to be done in Python land, much easier to follow:
Some previous discussions for posterity:
@vitaly-t's abandoned attempt at a minimal parser (only gets the individual values as strings):
Another library in another language (Ruby this time) that's able to handle them:
I'm hoping if we pool together enough information we can figure out a decent solution, which I'm assuming at this point would live in an external package.
@noinkling You are looking at it in the wrong way. The main reason I abandoned my attempts was because the tuple format is now considered a legacy, sort-of obsolete, whereas JSON+JSONB is the new recommended approach. And this is where PostgreSQL team has been focusing after v9 of the server, to make JSON as fast as possible. Tuples are a history now. You should avoid using them.
@vitaly-t JSON only supports four primitive types: strings, a single JS-style "number" type, booleans and null
. If you have data using a type that doesn't match one of those (the timestamp/date types, for example), you lose type information and the ability for the library to automatically parse those values (via pg-types). You're left with having to do your own parsing on a query-by-query basis (since you need to know the structure of the result), or using a hacky and complicated workaround like manually outputting types in the result.
And even ignoring that, there's the fact that casting to JSON in Postgres can have less than ideal results in some edge cases. For example:
=# SELECT original, to_json(original) FROM (VALUES (timestamptz '20000-01-01'), (timestamptz '500-01-01 BC')) v (original);
original │ to_json
───────────────────────────┼────────────────────────────────
20000-01-01 00:00:00+00 │ "20000-01-01T00:00:00+00:00"
0500-01-01 00:00:00+00 BC │ "0500-01-01T00:00:00+00:00 BC"
Believe it or not those aren't valid ISO date strings:
https://en.wikipedia.org/wiki/ISO_8601#Years
To represent years before 0000 or after 9999, the standard also permits the expansion of the year representation but only by prior agreement between the sender and the receiver. An expanded year representation [±Y̲YYYY] must have an agreed-upon number of extra year digits beyond the four-digit minimum, and it must be prefixed with a + or − sign instead of the more common AD/BC (or CE/BCE) notation; by convention 1 BC is labelled +0000, 2 BC is labeled −0001, and so on.
In JS:
> new Date("20000-01-01T00:00:00+00:00")
Invalid Date
> new Date("0500-01-01T00:00:00+00:00 BC")
Invalid Date
In contrast, the date parser written for this library already supports the original formats fine (but not the JSON ones). In order to use that parser, you could keep the original format in JSON by casting to text
first, but suffice it to say that it can make certain queries significantly more complicated/verbose/ugly, and you still have the first issue.
If composite types/tuples/row values/records (whatever they're called) were parsed properly I wouldn't have to worry about any of that.
the tuple format is now considered a legacy, sort-of obsolete, whereas JSON+JSONB is the new recommended approach
Can you provide and support for this claim? I have not seen that anywhere.
Moreover, JSON does not support many types of values. For example, infinity and nan cannot be transported with standard JSON (and PostgreSQL has strict standard JSON).
And this is where PostgreSQL team has been focusing after v9 of the server, to make JSON as fast as possible.
That is true. JSON is really fast. Even more, parsing of JSON on node side is much faster than anything else because it is so heavily optimized in node.js. I made some benchmarks to confirm that.
On the other hand, parsing JSON is recursive. And you do not know what you are getting until you get it, so you have to scan and convert. On other hand, with tuple approach from PostgreSQL, you have all the information about the structure in advance, so at least in theory, you should be able to directly parse the object knowing what is where and so you can just map, without having to first scan what a token is and then map. I have seen this being done in some projects, but I do not find now a reference.
On the other hand, if messaging would be done in something like capnproto then parsing would be zero time instead of current conversion from any memory object to another memory object by copying.
I tried building something which would parse also embedded types, automatically, but I got stuck on the case when composite type is created implicitly inside the query. For example, when you select just a subset of fields from a table and you nest those results into a larger query. How to obtain information which fields you selected? Does anyone know of any other implementation which does this? So how to obtain typing information for the whole structure? PostgreSQL should have that internally?
See my question on the mailing list here: https://www.postgresql.org/message-id/flat/CAKLmikMrm778-eETLvVAd1W_u0R8TB%2BsuAFO6jhMTmXQg3yhGg%40mail.gmail.com
On the other hand, parsing JSON is recursive. And you do not know what you are getting until you get it, so you have to scan and convert. On other hand, with tuple approach from PostgreSQL, you have all the information about the structure in advance, so at least in theory, you should be able to directly parse the object knowing what is where and so you can just map, without having to first scan what a token is and then map. I have seen this being done in some projects, but I do not find now a reference.
I regularly use DBeaver and postgres (Java App) and the client resolve and display tuple in the right way.
for example
select customer from customer limit 2
Instead the same query on node-postgres
[
{
customer: '(72ef86cf-2f38-4848-9f26-808b61963b0f,Rey,Schinner,"Regional Research Facilitator",Usability,)'
},
{
customer: '(45a67e2f-fe57-459b-af7b-7284ddb9e47c,Evelyn,Harris,"Corporate Branding Director",Directives,)'
}
]
Hm, this simple query might work. But what about something like SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body)) FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts
? So when you generate ad-hoc types in results?
I think is correct that an "ad-hoc" type rest unparsed es:
select
c.id,
c."date",
(
select
array_agg(row(cr.id, cr.quantity))
from
cart_row cr
where
cr.cart_id = c.id) as r
from
cart c
but if I use table tuple I get
select
c.id,
c."date",
(
select
array_agg(cr)
from
cart_row cr
where
cr.cart_id = c.id) as r
from
cart c
If you need a type different from the table one I think you have to declare it and than cast es:
CREATE TYPE partial_cart_row as (
"id" uuid,
"quantity" smallint
);
select
c.id,
c."date",
(
select
array_agg(row(cr.id, cr.quantity)::partial_cart_row)
from
cart_row cr
where
cr.cart_id = c.id) as r
from
cart c
I've started working on a library to automatically parse values of custom types (domains, composites, enums, etc.) whenever possible based on the Postgres system catalog tables.
For now, if you want to parse a composite vale, you could try this out: https://github.com/boromisp/postgres-composite This can turn a composite literal into an array of literals.
how does it compare to https://github.com/vitaly-t/pg-tuple ?
They seem to be basically the same thing, with slightly different APIs.
I haven't found pg-tuple
when I put together postgres-composite
.
pg-tuple
doesn't seem to handle NULL
fields correctly.
For example (,"",)
should be parsed as NULL, '', NULL
, not as '', '', ''
postgres-composite
, as https://github.com/bendrucker/postgres-array should work just fine with any array literal.parse
from postgres-composite
returns Iterable<string | null>
instead of an array of strings.That said, I haven't yet tried postgres-composite
with real database output, only run it through some synthetic tests.
That said, I haven't yet tried
postgres-composite
with real database output, only run it through some synthetic tests.
draft a simple parser based on postgres-composite
https://github.com/FbN/pg-tuple-types
Then based on pg-tuple-types I have implemented a simple Knex extension that help eager loading data pg-related.
So in PostgreSQL one can do something like:
Which returns
post
as a composite type. Currently, this package returns:Ideally, it should return something like:
I know I can achieve this by doing such query:
But frankly, I am not sure why would I want to convert it to JSON and back just to get it over the wire in correct structure.
I have seen some other issues about composite types, but this is something which seems to be doable from my short exploration. It seems PostgreSQL exposes necessary information but it might require additional queries, which could be cached, I believe.
I tried the following script:
Results:
As you see, it is possible to get both names and type IDs for values inside
post
. With some recursion, this could be converted, no? Information how to parse tuple itself is here. And then we would have to parse each individual element.