Closed aslgraefe closed 1 year ago
Yes, I believe the variant data is superior to that of a gene.
1) Right now this handles one variant per case/phenopacket, right? However, will this always be the case for the ERKER format? How about diseases with autosomal recessive mode of inheritance, where a pair of heterozygous variants can be disease causing (compound heterozygosity)? In that case, we need to create 2 genomic interpretations, one per variant, and I am not sure the current architecture would allow that.
ANSWER: In the new version of the ERKER_v1.6 we allow these choices for the zygosity: Homozygous // (simple) Heterozygous // Compound heterozygous // Double heterozygous // Hemizygous // Not recorded
- we also allow multiple genomic interpretations: 3 clinical relevant variants and 5 genetic side variants. Therefore, we could allow a compound heterozygous patient to be captured with both clinical relevant variants.
- However, for now the ERKER form only allows zygosity of the main diagnosis i.e. the main disease causing mutation. For this analysis with MC4R-deficiency this will be sufficient and we will stay with v1.6.
- as much as I understand correctly, you would suggest adding zygosity to all genetic clinical relevant variants and side variants, right? I noted this and we could implenet this into our version v1.7.
2) genomic_interpretation_variant = GenomicInterpretation( subject_or_biosample_id=phenopacket_id, interpretation_status=interpretation_status, variant_interpretation=variant_interpretation ) I believe we should only keep the genomic interpretation with the variant data (above). In this setting, where we have short variants (single nucleotide polymorphisms (SNP) or short insertions/deletions), we can infer the gene from the variant data. In general, the
gene
field of the GenomicInterpretation should be used if we do not have the variant data, for instance, to represent the candidate or the most likely causal gene.ANSWER: we dropped out the 'gene' field of GenomicInterpretation
3) if we are including the diagnosis in the phenopacket, then we must add the
disease
field - a required field of the Diagnosis element. The disease is an ontology class, so we need an id (e.g.MONDO:0019115
) and a label (e.g.obesity due to melanocortin 4 receptor deficiency
) as in http://purl.obolibrary.org/obo/MONDO_0019115. Adam, Filip, please select the disease code! genomic_interpretations=[ please delete the comment and the line below if you agree with dropping the gene interpretation. -> # genomic_interpretation_gene,ANSWER: we changed the disease code and label to the Diagnosis (Interpretation) block.
4) Generates a random str like
4d062c1e-ea58-4ad9-8307-b7d07fe6b0ab
consider settingprogress_status
(https://phenopacket-schema.readthedocs.io/en/latest/interpretation.html) #progressstatus Right now it is set to the default value =UNKNOWN_PROGRESS
. However, we do have the diagnosis, which is inconsistent with the unknown progress. I think this is a clinical question. Adam, can we consider the variants as causal/contributory (per https://phenopacket-schema.readthedocs.io/en/latest/genomic-interpretation.html#interpretationstatus) to the disease? If yes, then we can make the diagnosis and set the progress status, right?ANSWER: we adapted the interpretation_status to 'CONTRIBUTORY' and the progress_status to 'SOLVED'.
@ielis (& @frehburg) please find the answers and implementations to each comments. Thank you for your comments!
Regarding my point 1) above - I only see adding one genomic interpretation to the Interpretation > Diagnosis
element here. Do you plan to keep just one variant for this dataset? That's OK if there truly is just 1 variant per patient. However, I don't think this will necessarily work for other datasets, e.g. AR diseases with 2 compound heterozygous variants. Most likely, the entire function _map_interpretation
will have to be updated to accommodate the change of the variant cardinality from 1
per interpretationto
1..*`.
as much as I understand correctly, you would suggest adding zygosity to all genetic clinical relevant variants and side variants, right? I noted this and we could implenet this into our version v1.7.
Regarding adding zygosity - you're setting the allelic_state
field for the VariationDescriptor
, which is the right thing to do, so you should be all set there. No changes needed. I see this in the phenopackets:
"allelicState": {
"id": "GENO:0000135",
"label": "heterozygous"
}
which is correct! :)
Dear Daniel, thank you for your answer. Well, we sometimes also have two (once three) mutations per patient - but considered these to be within the same genomicInterpretation as different expressions like so:
"genomicInterpretations": [
{
"subjectOrBiosampleId": "21",
"interpretationStatus": "CONTRIBUTORY",
"variantInterpretation": {
"variationDescriptor": {
"id": "id:A",
"expressions": [
{
"syntax": "hgvs",
"value": "NP_005903.2:p.Val103Ile"
},
{
"syntax": "hgvs",
"value": "NM_005912.3:c.307G>A"
}
],
"allelicState": {
"id": "GENO:0000135",
"label": "heterozygous
}
}
}
}
]
Is this still valid? And regarding the Geno-Pheno-correlating, does the tool allow the multiple expressions within one genomicInterpretation?
All the best, Adam
Hi Adam, no, I think this is not the right way to do this. First, the expressions should represent descriptions of the same variant in different notations/nomenclatures.
For instance, the variant with the following genomic coordinates:
can be also described using HGVS syntax with respect to the coding DNA of the gene (c.
):
NM_000088.3:c.589G>T
or with respect to the genomic reference sequence (g.
, also HGVS):
NG_007400.1:g.8638G>T
NG_007400.1 is a GenBank entry with COL1A1 sequence
or SPDI notation
GRCh38:17:50198002:C:A
So, expressions represent different notations of one variant, not several variants in the subject/patient.
You must encode each variant into its own GenomicInterpretation
element and set it into genomic_interpretations
field of the corresponding Diagnosis.
And regarding the Geno-Pheno-correlating, does the tool allow the multiple expressions within one genomicInterpretation?
Genophenocorr considers variants from all genomic interpretations of the phenopacket in the analysis.
Hi Daniel,
Thank you very much for your comment and explanation.
You must encode each variant into its own GenomicInterpretation element and set it into
genomic_interpretations field of the corresponding Diagnosis.
I will implement this tomorrow morning right away.
Genophenocorr considers variants from all genomic interpretations of the phenopacket in the analysis.
Perfect, thank you. Can you talk tomorrow any time for an introduction?
Hi Adam, the repo is at https://github.com/monarch-initiative/genophenocorr, we recently made it public.
We are working on making a PyPi release in these days, and there are online docs as well as some example notebooks (e.g. for analysis of KBG variants here).
However, the docs and notebooks are still work in progress. So, feel free to look at the code, but please be aware that the material may be a bit outdated. We'll be fixing this in the coming days..
Regarding adding zygosity - you're setting the
allelic_state
field for theVariationDescriptor
, which is the right thing to do, so you should be all set there. No changes needed. I see this in the phenopackets:
Hi Daniel,
we have updated the form to version v1.7. As we sometimes have more variants, each variant has its own zygosity now.
Hi Daniel,
You must encode each variant into its own GenomicInterpretation element and set it into genomic_interpretations field of the corresponding Diagnosis.
We have implemented this, so that each variant has its own GenomicInterpretation element with its mutations with zygosity.
Thank you!
1) Right now this handles one variant per case/phenopacket, right? However, will this always be the case for the ERKER format? How about diseases with autosomal recessive mode of inheritance, where a pair of heterozygous variants can be disease causing (compound heterozygosity)? In that case, we need to create 2 genomic interpretations, one per variant, and I am not sure the current architecture would allow that.
2) genomic_interpretation_variant = GenomicInterpretation( subject_or_biosample_id=phenopacket_id, interpretation_status=interpretation_status, variant_interpretation=variant_interpretation ) I believe we should only keep the genomic interpretation with the variant data (above). In this setting, where we have short variants (single nucleotide polymorphisms (SNP) or short insertions/deletions), we can infer the gene from the variant data. In general, the
gene
field of the GenomicInterpretation should be used if we do not have the variant data, for instance, to represent the candidate or the most likely causal gene.3) if we are including the diagnosis in the phenopacket, then we must add the
disease
field - a required field of the Diagnosis element. The disease is an ontology class, so we need an id (e.g.MONDO:0019115
) and a label (e.g.obesity due to melanocortin 4 receptor deficiency
) as in http://purl.obolibrary.org/obo/MONDO_0019115. Adam, Filip, please select the disease code! genomic_interpretations=[ please delete the comment and the line below if you agree with dropping the gene interpretation. -> # genomic_interpretation_gene,4) Generates a random str like
4d062c1e-ea58-4ad9-8307-b7d07fe6b0ab
consider settingprogress_status
(https://phenopacket-schema.readthedocs.io/en/latest/interpretation.html)progressstatus
Right now it is set to the default value =
UNKNOWN_PROGRESS
. However, we do have the diagnosis, which is inconsistent with the unknown progress. I think this is a clinical question. Adam, can we consider the variants as causal/contributory (per https://phenopacket-schema.readthedocs.io/en/latest/genomic-interpretation.html#interpretationstatus) to the disease? If yes, then we can make the diagnosis and set the progress status, right?