airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Added changes to MHC genotype objects per standards meeting #603

Closed bcorrie closed 2 years ago

bcorrie commented 2 years ago

Closes #573

bcorrie commented 2 years ago

Anyone else (@bussec) have any thoughts on this? If not I will fix the consistency checks and then merge... I left the consistency checks unfixed in case we made changes - didn't want to have to do them N times 8-)

bussec commented 2 years ago

@bcorrie I find MH1 and MH2 highly uncommon. In contrast to IG and TR they are not even used in the gene symbols. I also check again with MRO (which is used by IEDB and which we are trying to implement for the Receptor object) and while they unfortunately do not provide any top-level definition, they use the string "MHC class (I|II)` consistently. Therefore, I would opt for going with this, can make the necessary changes if you want.

bussec commented 2 years ago

One more thing: We IMO never had an explicit rule about this, but I just realized that most of our enums are low-caps snake_case -- the main exception being the IG/TR loci fields and and two values for sex. If we want to stay consistent with this, then mhc_class_i and mhc_class_ii would be the values of choice.

bcorrie commented 2 years ago

@bussec no objections what so ever. You can make the changes if you like, I will get to them probably later this week if you don't...

javh commented 2 years ago

Spaces in terms (MHC class I) and terms people aren't likely to use in tool output/publication (mhc_class_i) are both worrisome to me. It doesn't look like there's a good solution here. I checked a few standard abbreviation lists for journals and I haven't seen anything more specific than "MHC" yet.

bussec commented 2 years ago

@javh Agreed. For my perspective there are the following options (note that we should have support of invariant/non-classical MHC as well):

  1. Common textbook terms with spaces, also used in MRO: MHC class I, MHC class II, non-classical MHC
  2. Terms defined by IMGT (U Montpellier) that are not used by anyone else: MH1, MH2, RPI-MH1Like
  3. Terms defined by IMGT/HLA (EBI): There are none! (they only describe gene locus, not a gene group)
  4. The current "nerd-case" terms: mhc_class_i, mhc_class_ii, non_classical_mhc
  5. A compromise between 1 and 4, e.g., MHC-I, MHC-II, MHC-invariant
javh commented 2 years ago

A compromise between 1 and 4, e.g., MHC-I, MHC-II, MHC-invariant

I've seen MHC1 and MHC-1 in a few publications / databases as well.

The invariant chain abbreviation seems to be Ii. That's awful. MHC-I, MHC-II and MCH-Ii would at least work, but yuck.

bussec commented 2 years ago

Ii refers specifically to CD74, which is involved in Class II presentation. The "invariant" MHC that I was referring to are things like CD1d or HLA-F, which are structurally closer to MHC-I. There are multiple terms for these gene/molecules: invariant, non-polymorphic, non-classical. In my suggestion I went for "invariant" but I see that there is potential for confusion with Ii, so maybe MHC-nonclassical would be a better choice.

javh commented 2 years ago

@bussec Ah, gotcha. I did not know that. Well that makes things harder.

bussec commented 2 years ago

And just for avoidance of doubt: Yes, Roman numerals are the sole culprit of the fall of the Roman Empire. But for MHC classes this has been used by immunologist for decades, so I would stick to it if possible.

bussec commented 2 years ago

@bcorrie @javh If we agree on the terms mhc_class this is good to go from my side. Additional changes since yesterday:

javh commented 2 years ago

@bussec, Good with me. I like the compromise names. They are clean while being familiar enough that they won't cause confusion.

Speaking of, if we're making stuff up, we could use MHC-N or MHC-NC instead of MHC-nonclassical. Though, I don't have a preference, because -nonclassical has the benefit of being immediately understood.

bcorrie commented 2 years ago

@bussec looks good to me. I am a bit confused by the conflicts above. Seems like the ontology merge and this merge are conflicting - do you want to resolve these?

bussec commented 2 years ago

@bcorrie Will try to merge this now... the errors are due to the changes in the #574 PR, not the #524 for which I already did the rebase.