Closed williamdlees closed 5 years ago
Possible to provide this data in an AIRR format? I know we haven't finalized the germline schema yet, but this might be impetus to formalize it more.
I'll try to make it available in a format that reflects the schema, but probably in FASTA as well, as that's what all the tools seem to expect at the moment, Happy to reflect an AIRR standard as it emerges...
Now live at https://ogrdb.airr-community.org/sequences . The YAML reflects the schema in this repo. It could do with a header record, but this is the area of the definition that needs most work I think. Could be extended with some options, e.g. exclude records already incorporated into IMGT, include only records with a minimum affirmation level.
Is it time to build in a download capacity for affirmed sequences? It may be some time before they are available via IMGT, and I don't see why people shouldn't be using them in the meantime. Perhaps later this could be expanded to allow downloading of all sequences seen by the IARC.