Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
83 stars 37 forks source link

Questioning the value of OpenAPI schemas in this repo #434

Closed ml-evs closed 8 months ago

ml-evs commented 1 year ago

The OpenAPI schemas in this repo somehow provide the OpenAPI representation according to optimade-python-tools at the moment of the releases of the specification.

Currently this is quite out of date with the versions at https://github.com/Materials-Consortia/schemas, and in fact, requiring the OpenAPI schemas to be correct on release is quite an additional burden for optimade-python-tools. (see https://github.com/Materials-Consortia/optimade-python-tools/pull/1427 and the many PRs before it that show my attempts to keep 1.1 compatibility whilst generating the 1.2 schema).

I will make a PR here with the latest draft 1.2 OpenAPI schema from optimade-python-tools, but I would suggest in future that we remove ./schemas from this repository and instead version them more explicitly in the schemas repo (and link to them from here). I will try to remember to bring this up next meeting...

merkys commented 1 year ago

AFAIR, this is somewhat of a chicken-and-egg problem, arising due to the fact that OpenAPI schemas are generated from optimade-python-tools source. I myself see the value of OpenAPI schemas and would prefer having them alongside the specification as means to validate responses, generate UIs and even to dispel some specification ambiguity (hopefully). However, I have no idea how to make their development easier. I like the staging branch approach, but I get why it is a burden.

ml-evs commented 1 year ago

Would you be against adding a link in the specification for an authoritative source for the OpenAPI schemas? (i.e. schemas.optimade.org or https://github.com/Materials-Consortia/schema, we could even automatically release them on Zenodo and provide DOIs).

merkys commented 1 year ago

I liked having specification and schemas in the same repository under the same version number, but I guess separating them wouldn't be too much harm. Would it be possible to retain the same version numbering scheme? Otherwise it might be difficult to track the correspondence.

Out of plain curiosity, how does the split help to reduce the burden?

rartino commented 1 year ago

I understand that we are talking about quite a bit of work, but, are we not expected to provide valid schemas "from day one" when we release a new version? I'd rather take more generic schemas, even down to just bare-bones validating essentially just that the response is on jsonapi form, than making releases with no schemas at all.

With that said -- my plan here was that we were meant to create property definitions for all data entries we have standardized so far. After that it should be fairly straightforward to auto-generate proper OpenAPI schemas right from what we have in the specification repo without going via optimade-python-tools. (In fact, I'd have done it already if not the post-corona wave of lesser sicknesses kept me from doing anything but the most time critical work, yada, yada excuses.)

What time plan are we on for the v1.2 release? The above still seems doable with reasonable effort if I put some time towards it.

ml-evs commented 1 year ago

I understand that we are talking about quite a bit of work, but, are we not expected to provide valid schemas "from day one" when we release a new version? I'd rather take more generic schemas, even down to just bare-bones validating essentially just that the response is on jsonapi form, than making releases with no schemas at all.

Sure, and there will be a valid schema for 1.2 from day one, but our current approach leaves us with no space for schema-only bug fixes. Maybe this isn't critical, but I can foresee that the optimade-python-tools schema will be buggier this time around given the additional spec changes. As you suggest, it is only currently a lot of work because the schema requires the entire server to be set up with all possible features --- I wouldn't be against replacing the ones we currently release with more generic versions in the spec repo (then the schemas.optimade.org can give our best attempt at a full schema).

With that said -- my plan here was that we were meant to create property definitions for all data entries we have standardized so far. After that it should be fairly straightforward to auto-generate proper OpenAPI schemas right from what we have in the specification repo without going via optimade-python-tools. (In fact, I'd have done it already if not the post-corona wave of lesser sicknesses kept me from doing anything but the most time critical work, yada, yada excuses.)

I get lost on whether I mentioned this in the other issue thread, but if you have not yet started this (hope you are feeling better, of course!) then what we have in https://github.com/Materials-Consortia/optimade-python-tools/pull/1425 might be a helpful starting point. I will be generating the new-style /info/structures response for the standardized fields soon too so we can see if we agree :sweat_smile:

What time plan are we on for the v1.2 release? The above still seems doable with reasonable effort if I put some time towards it.

As I posted in my other comment, it's up to us (did we confirm that you and I @rartino who are officially "release managers" a few months ago?), but I guess if we want to do this work first then a release candidate soon (now, or just after next meeting?) with a real release (or further rc) after our first 2023 meeting could be a sensible timeline?

rartino commented 1 year ago

Sure, and there will be a valid schema for 1.2 from day one, but our current approach leaves us with no space for schema-only bug fixes.

I don't think we should be afraid to fix schema-only bugs with point releases. It shouldn't be cumbersome - these releases would just replace the schema files and nothing else. Given that we keep track of schemas on https://shemas.optimade.org by OPTIMADE version number, I think we actually need them to have their own version numbers.

It has also been argued that our schemas help clarify what we mean in the specification text and thus are an integral part of the specification. This view further establishes that schema bugs are specification bugs that motivate new point releases.

but I guess if we want to do this work first then a release candidate soon (now, or just after next meeting?)

Since you mention we may want to discuss next meeting what actually goes into the release; I think the earliest we can have a release candidate is right after that meeting. I'll attempt to at least start looking at this before that.

what we have in https://github.com/Materials-Consortia/optimade-python-tools/pull/1425 might be a helpful starting point.

Maybe I am misunderstanding, but I really think it is far more work associated with supporting property definitions in optimade-python-tools than it is to turn our current chapter 8 into property definitions and write a script that inserts them into otherwise barebones json-schema for OPTIMADE responses.

On that note, what makes the most sense for creating those barebones schemas? Use the current optimade-python-tools generated ones, or just grab http://jsonapi.org/schema and manually edit it to add the OPTIMADE conventions?

ml-evs commented 8 months ago

Closed by #502 and related PRs -- at some point we may catch up and generate OpenAPI schemas for future versions, but this is not a priority.