airr-community / ogrdb

Website and associated database for managing submissions of inferred alleles
Other
8 stars 1 forks source link

Download set of affirmed sequences #50

Closed williamdlees closed 5 years ago

williamdlees commented 5 years ago

Is it time to build in a download capacity for affirmed sequences? It may be some time before they are available via IMGT, and I don't see why people shouldn't be using them in the meantime. Perhaps later this could be expanded to allow downloading of all sequences seen by the IARC.

schristley commented 5 years ago

Possible to provide this data in an AIRR format? I know we haven't finalized the germline schema yet, but this might be impetus to formalize it more.

williamdlees commented 5 years ago

I'll try to make it available in a format that reflects the schema, but probably in FASTA as well, as that's what all the tools seem to expect at the moment, Happy to reflect an AIRR standard as it emerges...

williamdlees commented 5 years ago

Now live at https://ogrdb.airr-community.org/sequences . The YAML reflects the schema in this repo. It could do with a header record, but this is the area of the definition that needs most work I think. Could be extended with some options, e.g. exclude records already incorporated into IMGT, include only records with a minimum affirmation level.