airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Multiple schema files #594

Closed williamdlees closed 2 years ago

williamdlees commented 2 years ago

There are four copies of the schema file - two in specs at different versions of the OpenApi standard, and one each in lang/R and lang/Python. For the sake both of maintainers and users, can we please settle on one file? As a start, the code in lang could be modified to read the schema in specs, but if we are moving to v3, is there a roadmap for moving the lang code off v2 so that we only have one copy of the spec?

javh commented 2 years ago

IIRC, airr-schema-openapi3.yaml is a working copy that will replace airr-schema.yaml when the official switch to v3 is made. I don't recall the timeline, but I think v1.4 was the plan (#401).

The duplicate schema files under lang/python and lang/R are unfortunately necessary. We could probably get the python package to work without it, but the R build process requires everything to be in the same directory. Meaning, we would need some sort of custom prebuild/postbuild scripts if we didn't duplicate the schema files. So, we decided it was simpler to make copies of specs/airr-schema.yaml in lang/* and use the same solution for both the R and python packages.

There's a consistency check in the test suite, so any PR with inconsistent files can't be merged into master. Easiest thing to do is just work on the one in specs then copy it to lang/* once the PR discussion is over. v2 and v3 files have to manually be sync'd for now. Sorry about that - I may not remember what the schedule was, but I know we are behind it.

williamdlees commented 2 years ago

Thanks Jason. Knowing (as I do now) that the files are copies, if they are necessary, I don't understand why the post-processing checks can't update the copies if they are found to be out of date, rather than risking a conflict holding up the merge - or at least displaying a clear message that they need to be copied, rather than highlighting the differences. Maybe this is minor once you know the files are supposed to be identical, but would save someone else from the manual merging I did for quite a while before I realised it for myself.

bussec commented 2 years ago

FYI: I just fixed a synchronization conflict that had somehow sneaked into master in d1aa24b429496675df193f3c05d1b26af776d4ea.

williamdlees commented 2 years ago

sorry, I'm sure that was me.

javh commented 2 years ago

IIRC, we discussed having Travis sync the files originally and decided against it. I don't remember the details, but I think we didn't want the risk associated with having Travis infer the "correct" file based on the timestamp and do an automatic commit/push to the repo.

If helps, the copies in lang are basically source code for the R/python packages. We want them to match what's in specs once they hit master so that the schema version and package versions are identical. There are instances where you may want what's in lang to differ from specs in a working branch (testing new features, fixing old schema support), but I can't think of an occasion where it'd be appropriate for master.

Sorry for the confusion. We should've made this process, and the repo structure, more clear.

PS: I promise this is will not be the only annoying thing about having both a python and R package in the same repo...

williamdlees commented 2 years ago

Thanks Jason. Going to close this, then, as there's no simple answer and I'm sure it will get better over time.