Closed frehburg closed 1 year ago
After this is fixed we can continue on #174
ValueError: Cannot determine variant coordinate from genomic interpretation subject_or_biosample_id: "36"
interpretation_status: CONTRIBUTORY
variant_interpretation {
variation_descriptor {
id: "id:A"
expressions {
syntax: "hgvs"
value: "NP_005903.2:p.(Arg165Gln)"
}
expressions {
syntax: "hgvs"
value: "NM_005912.3:c.335C>T"
}
allelic_state {
id: "GENO:0000136"
label: "homozygous"
}
}
}
ValueError: Cannot determine variant coordinate from genomic interpretation subject_or_biosample_id: "41"
interpretation_status: CONTRIBUTORY
variant_interpretation {
variation_descriptor {
id: "id:A"
expressions {
syntax: "hgvs"
value: "NP_005903.2:p.(Val103Ile)"
}
expressions {
syntax: "hgvs"
value: "NM_005912.3:c.496G>C"
}
allelic_state {
id: "GENO:0000135"
label: "heterozygous"
}
}
}
ValueError: Cannot determine variant coordinate from genomic interpretation subject_or_biosample_id: "7"
interpretation_status: CONTRIBUTORY
variant_interpretation {
variation_descriptor {
id: "id:A"
expressions {
syntax: "hgvs"
value: "NP_005903.2:p.(Ser85Gly)"
}
expressions {
syntax: "hgvs"
value: "NM_005912.3:c.751A>C"
}
allelic_state {
id: "GENO:0000135"
label: "heterozygous"
}
}
}
ValueError: Cannot determine variant coordinate from genomic interpretation subject_or_biosample_id: "17"
interpretation_status: CONTRIBUTORY
variant_interpretation {
variation_descriptor {
id: "id:A"
expressions {
syntax: "hgvs"
value: "NP_005903.2:p.(Ile251Leu)"
}
expressions {
syntax: "hgvs"
value: "NM_005912.3:c.253A>G"
}
allelic_state {
id: "GENO:0000136"
label: "homozygous"
}
}
}
It seems that the genophenocorr expects a VCF format as variationDescriptor. However, we only have HGVS values
@ielis Could you please take a look at this? These errors are being generated in the notebook MC4R.ipynb
(on branch #174) by this code:
from genophenocorr.preprocessing import configure_caching_patient_creator
from genophenocorr.preprocessing import load_phenopacket_folder
patient_creator = configure_caching_patient_creator(hpo)
phenopacket_input_folder = r'..\data\out\phenopackets\example-phenopackets-from-synthetic-data'
patient_list = load_phenopacket_folder(pp_directory=phenopacket_input_folder, patient_creator=patient_creator)
Hi @graefea @frehburg
I think your code is OK, the issue is that we do not yet have logic to go from HGVS annotation (e.g. NM_005912.3:c.253A>G
) to genomic coordinates (e.g. chr18:60372097T>C
). Do you think you could help out with this?
In the genophenocorr's framework, this would mean implementing genophenocorr.preprocessing.VariantCoordinateFinder to go from HGVS to genophenocorr.model.VariantCoordinates
. Having VariantCoordinates
will allow the rest of the framework to work.
We can use Ensembl's REST API to go from HGVS to genomic coordinate space. Here is an example query for the variant above, with a Python postprocessing to prettify the output:
curl 'https://rest.ensembl.org/vep/human/hgvs/NM_005912.3:c.253A>G?refseq=1' \
-H 'Content-type:application/json' | python3 -m json.tool > hgvs_MC4R_missense.json
We can obtain the bits required to construct VariantCoordinates
from the output.
I prepared a stub and some unit tests in #75 (branch get-variant-coordinates-from-hgvs
). Can you please have a look and see if you can help with this?
Thanks a lot..
ValueError: Cannot determine variant coordinate from genomic interpretation
Hey @graefea, es gibt da leider einen bug anscheinend in unseren phenopackets. Ich denke es liegt an den Rohdaten aus dem ERKER. Auf jeden fall magphenopackets
irgendetwas an der Art wie wir die Koordinaten der Variante aufgeschrieben haben. Schau dir das bitte mal an und sie ob du daraus schlauwerden kannst.LG,
Filip
P.S. Die Fehlermeldung: