PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
22.71k stars 1.01k forks source link

Automatically find the path between tables #994

Closed skinkie closed 6 years ago

skinkie commented 6 years ago

Split from #987

As user I expect that postgrest is able to infer the shortest path between two tables using foreign key relations via the catalogue. At this moment the user would need to mention the joins manually.

create table one(oneid text not null, primary key(oneid));
create table two(twoid text not null, primary key(twoid), oneid text references one(oneid));
create table three(threeid text not null, primary key(threeid), twoid text references two(twoid));
grant select, references on one to testuser;
grant select, references on two to testuser;
grant select, references on three to testuser;
curl -X GET "http://localhost/one?select=oneid,two(twoid)" -H "accept: application/json" ; echo
[]
curl -X GET "http://localhost:3000/two?select=twoid,one(oneid)" -H "accept: application/json" ; echo
[]
curl -X GET "http://localhost:3000/one?select=oneid,three(threeid)" -H "accept: application/json" ; echo
{"message":"Could not find foreign keys between these entities, No relation found between one and three"}

Currently it is required to explictly add:

http://localhost/one?select=oneid,two(twoid,three(threeid))
ruslantalpa commented 6 years ago

what it the usecase for this (specific example)? provide also a sample output shape can you write manually a query that returns this type of response?

skinkie commented 6 years ago

@ruslantalpa would the following answer your question?

The use case would be to completely ignore the table schema. And trust that inference is enough to build the query.

SELECT
    tc.table_name, kcu.column_name,
    ccu.table_name AS foreign_table_name,
    ccu.column_name AS foreign_column_name
FROM
    information_schema.table_constraints AS tc
    JOIN information_schema.key_column_usage
        AS kcu ON tc.constraint_name = kcu.constraint_name
    JOIN information_schema.constraint_column_usage
        AS ccu ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY';
      table_name       |        column_name         |  foreign_table_name   | foreign_column_name
-----------------------+----------------------------+-----------------------+---------------------
 two                   | oneid                      | one                   | oneid
 three                 | twoid                      | two                   | twoid

This creates the path:

two   -> oneid -> one
three -> twoid -> two

Inference will give:

one <- oneid <- two
two <- twoid <- three

Naively permutate over all results, but ignore same start and end points.

two   -> one one -> two (invalid)
three -> two two -> one
one   <- two two -> one (invalid)
one   <- two two <- three

In a more advanced example to infer relations using multiple tables resolving in the table above.

select * from one join two on (one.twoid = two.id) join three (three.twoid = two.id); 
ruslantalpa commented 6 years ago

what it the usecase for this (specific example)?

i was asking for a specific example, not theoretical things like "ignore schema". something like "i want to make a call, and get this data, in this shape because ... and i can not use the current interface because ..." your reply is more in the sense "i don't like the current url format" in a reasonably large schema (even from 10+) tables start being related in different ways and there are multiple paths between two poins so "shortest" path becomes very ambiguous and its a recipe for multiple bugs (you expect one path, and another one is used). it's the same thing with using * in calls instead of listing all columns.

can you write manually a query that returns this type of response?

the query you provided is very far from the the types of queries postgrest generates although this is the last thing to discuss here, the more important question here is the usecase.

skinkie commented 6 years ago

Let's go for real life examples then. Consider the schema below, which is basically NeTEx / Transmodel.

Given that I want to show all DestinationDisplays available for a single Line. PostgREST does not allow me to filter on embedded attributes. The query would become:

/line?publiccode=eq.1&select=route(id,servicejourneypattern(id,destinationdisplay(*)))

But ideally I would like to write:

/line?publiccode=eq.1&select=destinationdisplay(*)

Or even better:

/destinationdisplay?line.publiccode=eq.1

Similar for getting the route of a ServiceJourney.

/servicejourney?id=in.1&select=*,servicejourneypattern(id,route(id,pointonroute(order_,routepoint(*))))

Ideally I would like to write:

/servicejourney?id=in.1&select=*,routepoint(*)
create table DestinationDisplay (id text not null, primary key(id), Name text not null, ShortName text not null);

create table Line (id text not null, primary key(id), Name text, TransportMode text, PublicCode text, Monitored boolean, VetagLineNumber text, LinePlanningNumber text);

create table Route (id text not null, primary key (id), linerefid text, foreign key(linerefid) references Line(id));

create table RoutePoint (id text not null, primary key (id), x integer, y integer);

create table PointOnRoute (id text not null, order_ integer not null, primary key (id, order_), foreign key(id) references Route(id), routepointrefid text, foreign key (routepointrefid) references RoutePoint(id));

create table ServiceJourneyPattern (id text not null, primary key(id), Name text, RouteRefId text, foreign key (routerefid) references route(id), destinationdisplayrefid text, foreign key (destinationdisplayrefid) references DestinationDisplay (id));

create table ServiceJourney (id text not null, primary key(id), DepartureTime time not null, DepartureDayOffset integer, ServiceJourneyPatternRefId text not null, foreign key (ServiceJourneyPatternRefId) references ServiceJourneyPattern(id));
begriffs commented 6 years ago

Thanks for providing schema details. I created the db locally and pointed postgrest at it. Confirmed your request /line?publiccode=eq.1&select=route(id,servicejourneypattern(id,destinationdisplay(*))) completes without an error, although it returned no results because I haven't loaded any data inside.

I must admit that inferring the foreign-key hops between distant tables seems like asking for trouble. What if you have more than one way to connect the tables, such as when there is a diamond where A refers to B and C, and both of those refer to D? I'm thinking that asking to embed D inside A might return different results depending on the route chosen.

Also the inference would make it harder to tell why urls break if the schema changes. If your url explicitly lists the path from one table to the other then severing a link should give a specific error message about what went wrong. Whereas with inference the error message would say only that no path was detected.

skinkie commented 6 years ago

If you are interested I could obviously you with a PostgreSQL dump. If you would consider "asking-for-trouble" problematic because there might eventually exist a use-case that is ambiguous, then what about returning the result as a HTTP redirect as a form of "advanced query builder"?

begriffs commented 6 years ago

You're saying if there were more than one route for the shortcut the server would return HTTP 300 Multiple Choices with perhaps multiple Location headers for the options?

skinkie commented 6 years ago

Actually I didn't even know that was possible :) But my suggestion would be if you feed the system a query where a path does exist and it is unique it would do a HTTP 302 with the full query that would be required to get the result. If you say HTTP 300 might facilitate the ambiguous case, that might indeed be a very clear way to give the user an informed choice.

begriffs commented 6 years ago

I think this HTTP interface we've discussed makes sense: a simple redirect when the path is unique, or a disambiguation when it isn't. Now the amount of work it might take in the code could be pretty substantial and possibly difficult to get right. I'm personally not keen to implement it myself, but would consider reviewing a PR if it looked like it didn't make the code way more complex.

At least as it stands you aren't prevented from getting the results you need, you just have to construct an explicit path.

jackfirth commented 6 years ago

FYI, multiple Location headers in a 300 Multiple Choices response isn't RFC 7231 compliant because section 6.4.1 explicitly states that only one such header may be provided:

If the server has a preferred choice, the server SHOULD generate a Location header field containing a preferred choice's URI reference. The user agent MAY use the Location field value for automatic redirection.

To provide multiple choices, the server should define a content type (such as application/vnd.postgrest.route-choices+json) that represents a list of links for the client to choose from. This lets you include arbitrary metadata about those links in the response, so postgrest could tell clients details about the different foreign key relationships for each route.

skinkie commented 6 years ago

@begriffs thanks, then I'll put it on my never ending todo list :-)

ruslantalpa commented 6 years ago

even if one could infer a path between two tables, the shape of the response would still include all the intermediate tables, it's just how postgrest works and this is unlikely to change. in this case, there will be a disconnect between the url and the shape of the response so ppl will start complaining that it's a bug. in short, it's just not worth it, it a few keystrokes that you write once (when you define the call url in your code).

skinkie commented 6 years ago

I am happy to agree to disagree.