Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
83 stars 37 forks source link

Handle OpenAPI Schemas for v1.2.0 #502

Closed rartino closed 8 months ago

rartino commented 8 months ago

We need to decide and implement some handling of OpenAPI schemas for the v1.2.0 release.

The new framework with property definitions that hopefully soon will be merged #445 generate JSON Schemas for OPTIMADE, but not full OpenAPI schemas. It is possible to extend that framework to do this. On the other hand, optimade-python-tools will likely anyway not use them, but generate their own. It may be a better use of effort to instead extend the process_schemas tool to generate pydantic declarations that can be used for that.

Hence, a proposal is that we:

  1. For now, just clean out references to OpenAPI schemas in the repo.
  2. Add functionality to generate pydantic declarations for optimade-python-tools from property definitions
  3. At some point in the future: work towards adding back OpenAPI schemas generated directly out of the property definitions in the same way we generate JSON Schemas from property definitions.

A related old issue that should be fixed if we handle this is #434. (Also: pinging @ml-evs as I hope this issue addresses a concern posted in #445).

ml-evs commented 8 months ago

I think probably this needs to be properly discussed on Friday. I am fine with the proposal and that the OpenAPI schemas should not hold up the release. Although they are nice to have, I am not aware of anyone using them --- they simply came "for free" when writing the optimade-python-tools server so at that point it was worth contributing them back.

The current status of the optimade-python-tools 1.2 support is as follows:

  1. Files endpoint routes/models implemented but don't do anything
  2. All new standard properties (e.g., symmetries), response metadata fields (e.g., licenses [though not the very latest additions from a recent PR] and request_delay).
  3. Property definitions, as of their state EoY 2022.
  4. Boolean values in filters

That leaves several big things missing:

a. Partial data b. Per-entry metadata c. Verification of the new property definition format as of 2024

An additional twist is that 1-4 above were implemented for optimade-python-tools v0.25.x, which uses pydantic v1. Because of the demands of upstream servers, we had to do a significant amount of work to upgrade to pydantic v2 (which updated basically every model and our own internal schema gen/validation). Johan had done a lot of work on a) and b) but all of these changes would also need to be heavily refactored to accommodate pydantic v2 (I'm staring at 2 100+ commit, all files changed PRs for a https://github.com/Materials-Consortia/optimade-python-tools/pull/1812 and b https://github.com/Materials-Consortia/optimade-python-tools/pull/1698/, before we even get to the trajectories implementation draft https://github.com/Materials-Consortia/optimade-python-tools/pull/1845...). Given the amount of time I have available to spend on optimade-python-tools at the moment (asymptotically approaching 0) I couldn't give a timeline for when these will get done, if ever. Even if I could lock myself away for the entire OPTIMADE workshop I think there would still be things outstanding.

All of this is to say, although I fully support any attempt to generate these schemas generically (though this presumably just pushes all of this work entirely on you, @rartino 😕), the sections on partial data and per-entry metadata in responses are basically orthogonal to the new schema definitions so would require quite a bit of extra work. Maybe we can get away with it as partial data is somehow "breaking out" of the JSON:API with the JSONL responses anyway, but at the very least all of the query parameters e.g. property_ranges, would need to be added

merkys commented 8 months ago

@ml-evs

Although they are nice to have, I am not aware of anyone using them [...]

I do :smile: Both COD and TCOD implementations consult OpenAPI schemas to find out entry types and fields defined for a particular OPTIMADE version. With the advent of machine-readable property descriptions (#445) I plan to switch to them as they can communicate more OPTIMADE-specific details than OpenAPI.