airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Grouping of MHC genotype information #573

Closed bussec closed 2 years ago

bussec commented 2 years ago

Is it meaningful to have a grouping for MHCGenotype based on MHC class I/class II?

In contrast to IG and TR, both classes of MHC genes and both chain types of Class II are located on the same locus (except in Osteichthyes). Thus, if MHCGenotype.genotype_class actually annotates the locus, it would always be MHC and therefore could be dropped. This would also make MHCGenotype redundant, as there would only be a single instance of this object per subject, so all properties could as well be moved into the MHCGenotypeSet.

Alternatively, we could decide to group the genes into two objects, based on their classification as either Class I or II, which is not always obvious from the gene name (e.g., in mouse).

bcorrie commented 2 years ago

I have no real opinion on this, as we have no MHC data in any of our studies, but I know @schristley has some MHC data in VDJServer in some of their studies...

williamdlees commented 2 years ago

I've removed this from the germlines project for the time being because I thinkl we need some discussion before agreeing where it fits and what scope we are taking on

bcorrie commented 2 years ago

Based on discussions at the Standards meeting, I think we agreed that we want to have for an individual a single MHCGenotypeSet with at most two MHCGenotype, one for MHC Class 1 and MHC Class 2. This would imply that genotype_class would be a controlled vocabulary with something like this:

bcorrie commented 2 years ago

For the genotype_process, I talked to Nina and she suggested a couple of categorizations, some being orthogonal. Nina said:

 There are four categories of molecular HLA typing methods:

    Sequence-specific primer (SSP) typing on DNA
    Sequence-specific oligonucleotide probe (SSOP) on DNA
    Sequencing-based typing (SBT), usually by NGS on DNA
    Inference of HLA alleles from genome-wide DNA or RNA sequencing

The choice of method depends on the application. SSP may be used if you are looking for a specific allele
but is also suitable for low-resolution typing. SSOP is a technique that is mainly used in clinical labs and
can scale to be used on large numbers of samples (high-throughput). SBT by NGS is usually used for
high-resolution typing which is used in bone marrow transplants, where high-quality allele-level information
across multiple alleles is required for appropriate risk stratification and matching. SBT, unlike SSP and
SSOP, is also useful for the discovery of new alleles.  Finally, inference is typically used when samples are
either unavailable for HLA typing or financial constraints prevent HLA typing using a clinical-grade assay. 

You could potentially reduce this list to three options:

    PCR-based typing, without sequence data
    Sequencing-based typing
    Inference-based typing

Another way to break it down that might be helpful when searching data sets is by resolution of the method,
in which case I would use:

    Low-resolution
    High-resolution

But if you do this, then I would still have separate categories for the method...

So maybe the three options above with high and low versions of each - something like?

Not really sure if low/high applies to each of the three types. Maybe not the inference base methods? Others more expert, please comment 8-)

ustervbo commented 2 years ago

Grouping into MHC class I and class II captures the way we think of restrictions in terms of peptides presented and the type of interacting T cells.

It can of course be read from the actual MHCs, since this is in the most top level part of the name, but the distinction is so deeply rooted (I think) that it can be somewhat confusing if we do not group into MHC-I and MHC-II.

So... what Brian says.

javh commented 2 years ago

From the call:

mplefranc863 commented 2 years ago

Bonjour, We used MH1 and MH2 for MHC-I and MHC-II, respectively, given their different structures, on the IMGT web site I-ALPHA + B2M : MH1
II-ALPHA + I-BETA : MH2 https://www.frontiersin.org/articles/10.3389/fimmu.2014.00022/full Marie-Paule

bcorrie commented 2 years ago

Created a pull request to close this issue. Added the vocabulary terms as above as placeholders - probably not exactly what we want. genotype_process is an enum, we probably want it as a string with a list of recommended values. Not as restrictive as an enum as we want to add flexibility for the user???

bussec commented 2 years ago

@bcorrie I will replace the enum with a comment that recommended terms will be provided in the AIRR Docs, as I think it would be good to include Nina's comments and this would be rather lengthy if we put those in the description.