Open ustervbo opened 9 months ago
From the call:
Related: I notice that the software standard page references the "The AIRR Data Representation Working Group," which was decommissioned/folded into Standards ...pre-pandemic?? For that matter, we are tentatively planning to close up shop on the Software WG post-Porto, though the details have yet to be worked out...
Work on the documentation has started over at @eriicdesousa's fork: https://github.com/eriicdesousa/airr-standards
I reviewed the documentation and I have some questions, comments and suggestions on various sections of the document.
Random comments
Who is the target audience for the document? I am not a computer scientist and I read the document with the thought, 'How can I make our existing data MiAIRR compliant? How can I ensure that I gather the proper information in the future?' Here, I sometimes fall short. Not because I need to invest some time to understand, but because some parts simply seem inaccessible.
Maybe we should standardize the level names in the data model. The section 'MiAIRR-to-NCBI Implementation' uses slightly different terms for the levels. For instance, 'diagnosis & intervention' is mentioned in the bullet list in the section but only in the table in 'MiAIRR Data Elements', where it is 'diagnosis and intervention'. 'MiAIRR-to-NCBI Implementation' has 'processed sequences with basic analysis results' which is more detailed than 'processed AIRR sequences' used elsewhere (although 'basic analysis results' is non-descriptive). In the Nat. Comm schematic, there is no 'intervention' and the 6th level is called 'Processed Sequences with Annotations'.
The Repertoire Schema is UTF8, while the Rearrangement Schema is ASCII or UTF-8.
Sometimes we say OpenAPI V2, sometimes OpenAPI V2 and V3. (Actually, I think it's 1 all)
My understanding of the statement "The file can (optionally) contain an Info object, at the beginning of the file, based upon the Info schema in the OpenAPI V2 specification. If provided, version in Info should reference the version of the AIRR schema for the file." in 'Repertoire Schema > File Structure' is that we may not know the schema ID. Is this a problem? What is the purpose of the optional INFO field if it does not carry relevant information? If I understand the API correctly - and there is no guarantee that I am anywhere close - the schema is always returned, so the version number may be important.
study_description
andstudy_contact
in the Study-schema are missing inAIRR_Minimal_Standard_Data_Elements
.genotype
in the Subject-schema is not really explained (and does not exist inAIRR_Minimal_Standard_Data_Elements
).Section specific comments
Section: MiAIRR Data Elements
Section: MiAIRR-to-NCBI Implementation
Section: MiAIRR-to-NCBI Specification
Section: Requirement Levels of AIRR Schema Fields
Section: Metadata Annotation Guidelines
Section: AIRR Data Representations
CellProcessing
andNucleicAcidProcessing
data model objects belong to 'Sample Processing and Sequencing'?Section: Repertoire Schema
Repertoire
.Section: Rearrangement Schema
Rearrangement
is a sequence that describes'? (Add backticks)