Closed bussec closed 2 years ago
I have no real opinion on this, as we have no MHC data in any of our studies, but I know @schristley has some MHC data in VDJServer in some of their studies...
I've removed this from the germlines project for the time being because I thinkl we need some discussion before agreeing where it fits and what scope we are taking on
Based on discussions at the Standards meeting, I think we agreed that we want to have for an individual a single MHCGenotypeSet
with at most two MHCGenotype
, one for MHC Class 1 and MHC Class 2. This would imply that genotype_class
would be a controlled vocabulary with something like this:
mhc_class_1
mhc_class_2
For the genotype_process
, I talked to Nina and she suggested a couple of categorizations, some being orthogonal. Nina said:
There are four categories of molecular HLA typing methods:
Sequence-specific primer (SSP) typing on DNA
Sequence-specific oligonucleotide probe (SSOP) on DNA
Sequencing-based typing (SBT), usually by NGS on DNA
Inference of HLA alleles from genome-wide DNA or RNA sequencing
The choice of method depends on the application. SSP may be used if you are looking for a specific allele
but is also suitable for low-resolution typing. SSOP is a technique that is mainly used in clinical labs and
can scale to be used on large numbers of samples (high-throughput). SBT by NGS is usually used for
high-resolution typing which is used in bone marrow transplants, where high-quality allele-level information
across multiple alleles is required for appropriate risk stratification and matching. SBT, unlike SSP and
SSOP, is also useful for the discovery of new alleles. Finally, inference is typically used when samples are
either unavailable for HLA typing or financial constraints prevent HLA typing using a clinical-grade assay.
You could potentially reduce this list to three options:
PCR-based typing, without sequence data
Sequencing-based typing
Inference-based typing
Another way to break it down that might be helpful when searching data sets is by resolution of the method,
in which case I would use:
Low-resolution
High-resolution
But if you do this, then I would still have separate categories for the method...
So maybe the three options above with high and low versions of each - something like?
Not really sure if low/high applies to each of the three types. Maybe not the inference base methods? Others more expert, please comment 8-)
Grouping into MHC class I and class II captures the way we think of restrictions in terms of peptides presented and the type of interacting T cells.
It can of course be read from the actual MHCs, since this is in the most top level part of the name, but the distinction is so deeply rooted (I think) that it can be somewhat confusing if we do not group into MHC-I and MHC-II.
So... what Brian says.
From the call:
MHC-I
and MHC-II
? HLA Class I
and HLA Class II
? Check IMGT.Bonjour,
We used MH1 and MH2 for MHC-I and MHC-II, respectively, given their different structures, on the IMGT web site
I-ALPHA + B2M : MH1
II-ALPHA + I-BETA : MH2
https://www.frontiersin.org/articles/10.3389/fimmu.2014.00022/full
Marie-Paule
Created a pull request to close this issue. Added the vocabulary terms as above as placeholders - probably not exactly what we want. genotype_process
is an enum, we probably want it as a string with a list of recommended values. Not as restrictive as an enum as we want to add flexibility for the user???
@bcorrie I will replace the enum
with a comment that recommended terms will be provided in the AIRR Docs, as I think it would be good to include Nina's comments and this would be rather lengthy if we put those in the description
.
Is it meaningful to have a grouping for
MHCGenotype
based on MHC class I/class II?In contrast to IG and TR, both classes of MHC genes and both chain types of Class II are located on the same locus (except in Osteichthyes). Thus, if
MHCGenotype.genotype_class
actually annotates the locus, it would always beMHC
and therefore could be dropped. This would also makeMHCGenotype
redundant, as there would only be a single instance of this object per subject, so all properties could as well be moved into theMHCGenotypeSet
.Alternatively, we could decide to group the genes into two objects, based on their classification as either Class I or II, which is not always obvious from the gene name (e.g., in mouse).