PostgREST / postgrest

REST API for any Postgres database
https://postgrest.org
MIT License
23.42k stars 1.03k forks source link

Split off openapi into postgrest-contrib #1698

Open wolfgangwalther opened 3 years ago

wolfgangwalther commented 3 years ago

Problem

OpenAPI support is not up to date: We support v2, but v3 is out there for some years now (#932). Currently we have 26 issues open that are tagged "OpenAPI".

I'm sure some of those issues are not that hard to solve in theory. I feel like having the OpenAPI code in haskell is quite a hurdle for community contributions, though. The majority of our users probably does not know haskell at all - but would be able to make contributions in SQL.

Test coverage seems to be quite low for that area, too. I recently played a bit with #1653 - but was surprised to see that I didn't get any failing tests after changing some things.

Suggestion

I suggest to split off all the openapi related code. Remove it from core postgrest, put it in a separate (still official) postgrest-contrib repo and turn all the code into SQL. We already have the db-root-spec config option, to be able to use it.

Then we ship PostgREST without openapi support at all. In the docs we are very clear about how to add openapi support from our "official extension". It's basically just running a SQL script to create the function(s) + setting the config option. Without this config option, the root endpoint does not return anything (or maybe a hint on how to add openapi support, if that's better).

I see the following advantages of this:

The postgrest-contrib repo could have other SQL-only stuff, too. E.g. #1690 and other SQL snippets we are suggesting in the docs right now. We can have several distribution formats for this repo, e.g. as pg extension or just as one big SQL file (for cloud hosted pg servers). Users can install just some parts of this or everything - and then just enable those functions in the postgrest configuration.

Having this in a separate repo should make it a bit easier to track on the issue tracker. Also, while some SQL parts might be postgrest-version dependent, most parts while likely be usable across more versions. Having a different repo (and multi-version compatibility in mind) should make it easier to perform major version upgrades of postgrest - without the need to update the SQL code at the same time.

The biggest challenge I see will be making that openapi output reflect what PostgREST really does. Parts of the output are only reflecting the database schema and are unrelated to PostgREST itself (think db comments as an example), but others (think embedding opportunities - if those could be part of the openapi output, that would be great) are heavily dependent on PostgREST itself. To sync this, we could pass (a part of) the schema cache via JSON parameter to the openapi function, to make it aware of what PostgREST knows about the database. We'd still have to work out the details, of course.

steve-chavez commented 3 years ago

@wolfgangwalther Sounds great! I agree overall. Even if we solve the current issues more would keep coming considering that OpenAPI is still evolving. So It'd be great to do openapi through SQL.

Until now, the blocker for this was how to pass the schema cache info to db-root-spec. Before I thought we'd need to offer our schema cache in SQL for this. But now you're proposing we pass it as an argument to the db-root-spec function right? I think that could work.

One thing though, should we keep the "crippled" output for users that don't want a more complete OpenAPI? Perhaps we could do this for some versions until our SQL OpenAPI gains more acceptance.

I say this because we got some feedback on https://github.com/PostgREST/postgrest/pull/1511#issuecomment-709195912, I think most users would prefer postgrest stay low on the db state it needs.

steve-chavez commented 3 years ago

It seems that this would be done in part by #1697, allowing the DbStructure to be simple json. We'd then pass that json(on Haskell) to db-root-spec on each / request. One thing great about that is that db-root-spec would be always up-to-date with our schema cache.

steve-chavez commented 3 years ago

I wonder if we could pass a structured type instead of a big json to db-root-spec. Perhaps we could cast the json to a custom type. That way it'd be easier to work the openapi output.

wolfgangwalther commented 3 years ago

Until now, the blocker for this was how to pass the schema cache info to db-root-spec. Before I thought we'd need to offer our schema cache in SQL for this. But now you're proposing we pass it as an argument to the db-root-spec function right? I think that could work.

Yes, although I would like to limit the amount of data passed in as much as possible. I would still expect the function to make calls to the pg catalog to get most of the data. Just the key pieces that need to be in sync with postgrest, would come in via argument.

Otherwise we would create too much of a dependency of the openapi output on postgrest core again. Whereever possible, the openapi repo / code / development should not be blocked by changes needed in core.

One thing though, should we keep the "crippled" output for users that don't want a more complete OpenAPI? Perhaps we could do this for some versions until our SQL OpenAPI gains more acceptance.

I would not really keep it in deliberately... but maybe we can just do the following:

  1. Extend db-root-spec to get data passed in via argument. This enables the use of our custom proc.

  2. Develop the SQL-only openapi output.

  3. Make a release of postgrest.

  4. Rip out the openapi schema from core.

This way we'd have the benefits of simplicity for postgrest core immediately for the nightly. The latest released version will still have the old openapi output in it, but people can already use the new SQL func with it. So we could already update the docs to recommend that.

So basically.. I'd keep it in for max 1 release, but with smart timing.

I say this because we got some feedback on #1511 (comment), I think most users would prefer postgrest stay low on the db state it needs.

Yes, this kind of feedback is actually why I came up with the idea. I find this solution much less invasive as the original lib.sql / pg_postgrest extension idea. Conceptually the difference is now that we never require any of that SQL code for postgrest to work with basic functions - while the postgrest extension was required in previous ideas.

Still - if we're going to make the openapi output an optional extension, we can do it right away. I don't think we need to keep it in for several versions in different quality.

It seems that this would be done in part by #1697, allowing the DbStructure to be simple json. We'd then pass that json(on Haskell) to db-root-spec on each / request. One thing great about that is that db-root-spec would be always up-to-date with our schema cache.

Yes, that's really good. Looking at some of the numbers Remo posted in that PR, we might need to encode the DbStructure to JSON only once immediately when we update the schema cache. Otherwise we'd need to encode it on every / request, which might be costly...

I wonder if we could pass a structured type instead of a big json to db-root-spec. Perhaps we could cast the json to a custom type. That way it'd be easier to work the openapi output.

Hm. We could stay flexible here, I think. If we just say the first unnamed argument will get the dbStructure JSON via json_to_record, then this would automatically do that conversion to whatever composite type we have defined on the DB side, right? Isn't that the same way we call RPCs now?

By using an unnamed argument, we'd still have all named arguments unused for input from the query-string.

jsommr commented 3 years ago

Can't wait for OpenAPI to get removed. It was a great idea but with no one stepping up to maintain it (and few OpenAPI Haskell libraries) it just makes it seem like PostgREST has more issues than it actually does.

steve-chavez commented 3 years ago

I find this solution much less invasive as the original lib.sql / pg_postgrest extension idea

In regards to invasiveness, I wonder if we could extend PostgREST through an embedded scripting language such as Dhall and be even less invasive to the database. Another Haskell project that supports an scripting language is Pandoc, which supports Lua(examples).

Some pluses of Dhall:

Big con:


Just an idea I wanted to get off my head.

tonyxiao commented 1 year ago

Totally onboard with this. Right now we are resorting to proxying postgrest and modifying the root response with our own cusotmizations via typescript. Would love to contribute to the oss but brushing up on haskell and getting that whole dev env setup does not feel appealing at all when I have a deadline to meet 😂

tonyxiao commented 1 year ago

Would also enable alternative formats such as https://www.buildwithfern.com/

steve-chavez commented 1 year ago

Right now we are resorting to proxying postgrest and modifying the root response with our own cusotmizations via typescript.

@tonyxiao Just FYI, it's possible to override the openapi spec with a custom function using the db-root-spec config. See https://github.com/PostgREST/postgrest/discussions/2673#discussioncomment-5103648

tonyxiao commented 1 year ago

ah good to know, thanks for the tip!

wayland commented 1 year ago

Hi!

I've started some work on this, but I have some questions:

Assumptions:

Thanks!

steve-chavez commented 1 year ago

Hey @wayland!

Those are great news! @laurenceisla Also started work here on https://github.com/PostgREST/postgrest-openapi. Perhaps you'd like to collaborate? The extension is still at early stages.

Is there a way I can configure PostgREST to call one of my functions when the config is updated?

Yes, check https://postgrest.org/en/stable/references/api/openapi.html#override-openapi

Are we OK with adding other Postgres extensions? I'm looking at https://github.com/petere/pguri but am also curious whether plperl would be an acceptable option.

Ideally the extension would be pure SQL so it's compatible with cloud providers. plperl is available on most I think, so maybe it's fine.

As a follow-up, where are we storing the doco for the openapi stuff? My instinct says a docs folder in postgrest-contrib, but happy to be guided.

Yeah, docs for the extension would be fine on the repo README. Currently this is missing since it's still a prototype but you can check the tests for some examples.

wayland commented 1 year ago

Oh, great! He's gotten further than I have; I'll see what I can do that helps, though I may not get onto it until next weekend now.

Regarding the docs, I was planning that we take the existing ones from postgres-docs, and go from there.

Thanks! :)

wolfgangwalther commented 4 months ago

I see work has started in https://github.com/PostgREST/postgrest-openapi. Splitting this off into a separate repo was my original suggestion. However, now that we have merged postgrest-docs back into the main repo, I wonder whether a separate repo is the best approach?

We could also develop that SQL-based extension in the core repo. Similar to how PostgreSQL has their contrib/ folder in the main repo.

This would give us a much better base for nix-based tooling, for instance. Also, the extension would be much more visible, including the issues we already have in the issue tracker.

What would be the pros and cons for you here, @laurenceisla @steve-chavez?

steve-chavez commented 4 months ago

I don't have any objections on that. I think a contrib folder would be good too.

The only thing is that postgrest-openapi is not usable yet AFAICT (perhaps @laurenceisla can clarify), but maybe having it here with more eyes would make it progress quicker.

laurenceisla commented 4 months ago

The only thing is that postgrest-openapi is not usable yet AFAICT

Yes, it's not completely ported. The most important missing features are functions support, permissions and FK relationships (the last one maybe won't be included in the initial release).

This would give us a much better base for nix-based tooling, for instance. Also, the extension would be much more visible, including the issues we already have in the issue tracker.

I agree, it would be better to keep it all together, since the latest changes in core PostgREST can be tested with the library more promptly.

I don't have experience in maintaining contrib libraries alongside core. It would mean that if a change in PostgREST core postgrest-openapi, then would the releases stop and would we need to fix the library before doing so?

If the above is not a problem, then I think we could migrate it as-is to get the benefits that you mentioned.

wolfgangwalther commented 4 months ago

I don't have experience in maintaining contrib libraries alongside core. It would mean that if a change in PostgREST core postgrest-openapi, then would the releases stop and would we need to fix the library before doing so?

I think that's a good thing. I don't think we should "break the openapi contrib and then stop releasing". Instead we should treat it just like the docs - if you make a change, you need to make sure that the openapi part still works - in the same commit / PR. Don't merge when breaking stuff in the first place.

After all the idea to split this off into a SQL-based contrib piece is mostly about making it better, by making it easier to maintain and easier to contribute to. This is not about making it "less official".

If the above is not a problem, then I think we could migrate it as-is to get the benefits that you mentioned.

I'd not migrate it as is. I'd treat the current repo as some kind of PoC, where we can (and do) try things out for now, until we have a working replacement - with the ultimate goal of merging a working solution to the core repo.