airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Can the C57BL/6 IGH mouse germline set be given gene and subgroup designations? #598

Closed schristley closed 2 years ago

schristley commented 2 years ago

@williamdlees I was browsing the C57BL/6 IGH mouse germline set downloadable from OGRDB and noticed the allele descriptions don't have gene and subgroup designations. It seems like almost all of the genes have corresponding IMGT annotations so maybe they can?

Let me know if you prefer issues related to germline sets to be submitted elsewhere?

williamdlees commented 2 years ago

Thanks Scott. I will think about the best place for germline set issues to be raised, but this will do for now!

The lack of those designations is intentional. Broadly, we don't think it's a good idea to apply the IMGT gene designations as they stand to each strain, because of the large structural variations that exist between strains in the IG loci. We'll need genomic sequences for each strain to identify the genes. Likewise, as the subgroup designations are based on phylogenetic clustering of gene sequences - we can't derive those annotations until the genes have been established.

schristley commented 2 years ago

The lack of those designations is intentional. Broadly, we don't think it's a good idea to apply the IMGT gene designations as they stand to each strain, because of the large structural variations that exist between strains in the IG loci.

Okay, I was thinking that C57BL/6 was much better characterized, and it was only the other mouse strains, but it sounds actually like it applies to all. I was trying to decide if integrating these initial germline sets into VDJServer was worthwhile, but it sounds like they need to mature some more, so I will close the issue for now and revisit later.