bigbio / py-pgatk

Python tools for proteogenomics analysis toolkit
Apache License 2.0
11 stars 11 forks source link

Incompatible COSMIC mutation records #72

Open husensofteng opened 2 years ago

husensofteng commented 2 years ago

Some mutation records from the COSMIC database are not parsable by the cosmic-to-proteindb command due to complexity of the variants and incompatible records.

e.g. in the following record, it is not clear what is the alternative allele from the designated columns. (c.? and p.R882(H^C)):

DNMT3A ENST00000264709.7 2739 2978 2646118 2646118 2506326 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm acute_myeloid_leukaemia NS NS n COSM6498615 178530426 c.? p.R882(H^C) Substitution - Missense - - Variant of unknown origin 25858894 blood-bone marrow primary

also, in this record, the mutation is written as c.? and p.614_615>21

FLT3 ENST00000241453.11 2982 3765 1291198 1291198 1202297 haematopoietic_and_lymphoid_tissue NS NS NS haematopoietic_neoplasm acute_myeloid_leukaemia M3 NS n COSM36079 178534521 c.? p.614_615>21 Complex - insertion inframe het - - Variant of unknown origin 9305596 blood-bone marrow NS