Closed williamdlees closed 2 years ago
The file structure will need to match the DataFile which also has a direct correspondence to how data is returned from the ADC API.
Thanks Scott. I have added a reference to the AIRR DataFile,
Done, but can we please discuss their inclusion in the next major release.
@williamdlees There is no problem in including the Germline object in the next release of the Schema. But if we consider this to be required metadata for all studies, this would be an incompatible change of the MiAIRR standard an thus would have to wait until v2.0. Does this clarify my previous point?
MiAIRR does require a germline set reference, germline_database
in set 5 (DataProcessing). It might be reasonable to now require that field to contain the globally unique germline reference ID, that is, germline_set_ref
instead of being free-form text?
@schristley I agree, we should move in this direction. But IMO that's another pull request :smiley:
added some more detail about file structure, I think this might be ready to merge.
closes #571
Add a description of the file structure to the Germline Sets section, and fix minor typos. Add GermlineSet and GenotypeSet to the list of high-level schema objects.