airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

management and conversion of schema changes for users #582

Open schristley opened 2 years ago

schristley commented 2 years ago

The V1.4 Schema introduces some significant structural schema changes, e.g. single value fields turned into objects. These are significant enough that programs will get runtime errors if the code assumes one version or the other. I'm wondering, as a standards org, if we can be more user-friendly and proactive to help user manage these changes? Here are some user issues that I can imagine:

Right now, with the current design of the python/R libraries, they only operate on the current schema version.

Ideas/thoughts?

scharch commented 2 years ago
  • Design the python/R libraries to support read/write/validate of multiple schema versions.
  • Maintain separate schema files for older version, e.g. airr-schema-v1.3.1.yaml. We'd need to decide when and how to do this.
  • Add conversion routines in the python/R libraries, with callbacks (i.e. call user-defined functions), so users can add custom code for conversion, i.e., translate values, do ontology lookups, etc.

Definitely yes for 1 and 3; I think that probably requires 2, as well, but it's a little more opaque to me.

javh commented 2 years ago

For 1-2, I'm concerned about the effort we'll have to dedicate to supporting multiple schema versions in the reference libraries. It'll make the code messier and harder to maintain, depending upon how large the changes are. We did this with changeo, so we could support both the AIRR schema and the old Change-O schema. It works, but it's a huge pain and I kind of regret it, even though it's mostly just column renames.

The docs, schema, and R/python libraries are all tied to the git repo tags, so we could setup a v1.3 maintenance branch for patches to v1.3 everything. Thus, continuing to support v1.3 as needed, without having to add support for older versions to the current libraries and without having to maintain multiple schema files in master.

3 seems like a good idea to me. If we want people to make the switch, then we should enable them to do so. In retrospect, this is what I wish we did with changeo and all the R packages. Ie, swapped over to the AIRR Rearrangement schema natively, made a Change-O to AIRR conversion script, and called it good.

schristley commented 2 years ago

For 1-2, I'm concerned about the effort we'll have to dedicate to supporting multiple schema versions in the reference libraries. It'll make the code messier and harder to maintain, depending upon how large the changes are.

I'm hoping very little but it's not effortless. I don't think the libraries ever reference individual fields like analysis tools do. The current validation code is already general enough; it's been handling schema changes so far without needing to be modified. One exception being better support for allOf (#494 ).

Here's what I think is needed.

The docs, schema, and R/python libraries are all tied to the git repo tags, so we could setup a v1.3 maintenance branch for patches to v1.3 everything. Thus, continuing to support v1.3 as needed, without having to add support for older versions to the current libraries and without having to maintain multiple schema files in master.

There actually is one already because of some backporting for the ADC API. No tags have been created yet though.

3 seems like a good idea to me. If we want people to make the switch, then we should enable them to do so. In retrospect, this is what I wish we did with changeo and all the R packages. Ie, swapped over to the AIRR Rearrangement schema natively, made a Change-O to AIRR conversion script, and called it good.

I believe the issue is that pip (for python) only allows one version of the AIRR library to be installed? Can python import both AIRR V1.3 and AIRR V1.4 into the same running program?