airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Review of AIRR-Standards documentation #745

Open ustervbo opened 4 months ago

ustervbo commented 4 months ago

I reviewed the documentation and I have some questions, comments and suggestions on various sections of the document.

Random comments

Who is the target audience for the document? I am not a computer scientist and I read the document with the thought, 'How can I make our existing data MiAIRR compliant? How can I ensure that I gather the proper information in the future?' Here, I sometimes fall short. Not because I need to invest some time to understand, but because some parts simply seem inaccessible.

Maybe we should standardize the level names in the data model. The section 'MiAIRR-to-NCBI Implementation' uses slightly different terms for the levels. For instance, 'diagnosis & intervention' is mentioned in the bullet list in the section but only in the table in 'MiAIRR Data Elements', where it is 'diagnosis and intervention'. 'MiAIRR-to-NCBI Implementation' has 'processed sequences with basic analysis results' which is more detailed than 'processed AIRR sequences' used elsewhere (although 'basic analysis results' is non-descriptive). In the Nat. Comm schematic, there is no 'intervention' and the 6th level is called 'Processed Sequences with Annotations'.

The Repertoire Schema is UTF8, while the Rearrangement Schema is ASCII or UTF-8.

Sometimes we say OpenAPI V2, sometimes OpenAPI V2 and V3. (Actually, I think it's 1 all)

My understanding of the statement "The file can (optionally) contain an Info object, at the beginning of the file, based upon the Info schema in the OpenAPI V2 specification. If provided, version in Info should reference the version of the AIRR schema for the file." in 'Repertoire Schema > File Structure' is that we may not know the schema ID. Is this a problem? What is the purpose of the optional INFO field if it does not carry relevant information? If I understand the API correctly - and there is no guarantee that I am anywhere close - the schema is always returned, so the version number may be important.

study_description and study_contact in the Study-schema are missing in AIRR_Minimal_Standard_Data_Elements.

genotype in the Subject-schema is not really explained (and does not exist in AIRR_Minimal_Standard_Data_Elements).

Section specific comments

javh commented 4 months ago

From the call:

scharch commented 4 months ago

Related: I notice that the software standard page references the "The AIRR Data Representation Working Group," which was decommissioned/folded into Standards ...pre-pandemic?? For that matter, we are tentatively planning to close up shop on the Software WG post-Porto, though the details have yet to be worked out...