BRCAChallenge / brca-exchange

Overall management and deployment of the BRCA Exchange web portal and pipeline scripts
http://brcaexchange.org
28 stars 32 forks source link

Allele origin data missing for some ENIGMA variants #95

Closed melissacline closed 7 years ago

melissacline commented 8 years ago

This was as generated by releaseDiff.py. I wouldn't expect to see allele information lost for any ENIGMA variants, so this looks like a pipeline bug.

Allele_origin_ENIGMA variant chr17:43100558:GTA>G major change: Germline - Allele_origin_ENIGMA variant chr17:43100573:A>AAT major change: Germline -

melissacline commented 8 years ago

Likewise, the ENIGMA ClinVar accession seems to be list, probably at the same time. ClinVarAccession_ENIGMA variant chr13:32335140:TCAAG>T major change: SCV000245013 - ClinVarAccession_ENIGMA variant chr13:32335828:GA>G major change: SCV000245017 -

melissacline commented 8 years ago

And here are some for which "Collection_method_ENIGMA" has been lost. Probably part of the same package. Collection_method_ENIGMA variant chr13:32317314:CAAGT>C major change: Curation - Collection_method_ENIGMA variant chr13:32318103:CAGT>C major change: Curation -

melissacline commented 8 years ago

And a couple for which the Comemnt_on_clincial_significance_ENIGMA has been lost: Comment_on_clinical_significance_ENIGMA variant chr13:32348066:TAAG>T major change: Class 1 not pathogenic based on frequency >1% in an outbred sampleset. Frequency 0.03 (African), derived from 1000 genomes (2012-04-30). - Comment_on_clinical_significance_ENIGMA variant chr13:32349321:A>AC major change: Class 1 not pathogenic based on frequency >1% in an outbred sampleset. Frequency 0.06 (African), derived from 1000 genomes (2012-04-30). -

melissacline commented 8 years ago

And Condition_ID_type_ENIGMA: Condition_ID_type_ENIGMA variant chr17:43095101:C>CCTA major change: OMIM - Condition_ID_type_ENIGMA variant chr17:43095587:A>C major change: OMIM - Condition_ID_type_ENIGMA variant chr17:43097346:TA>T major change: OMIM - Condition_ID_type_ENIGMA variant chr17:43098204:AT>A major change: OMIM - Condition_ID_type_ENIGMA variant chr17:43098660:C>CT major change: OMIM -

melissacline commented 8 years ago

And Condition_category_ENIGMA: Condition_category_ENIGMA variant chr13:32317314:CAAGT>C major change: Disease - Condition_category_ENIGMA variant chr13:32318103:CAGT>C major change: Disease -

zfisch commented 8 years ago

In each one of these cases, the sources are different. Usually (all except maybe 3 cases), there are fewer sources in the new data, which likely means that variants that were previously merged are no longer being merged.

In order to resolve the issue, we'll need to decide how we want to handle merges moving forward.