BRCAChallenge / brca-exchange

Overall management and deployment of the BRCA Exchange web portal and pipeline scripts
http://brcaexchange.org
28 stars 32 forks source link

BIC clinical classification lost in what looks like most variants with BIC annotations #97

Closed melissacline closed 7 years ago

melissacline commented 8 years ago

Here are some examples, from releaseDiff.py

Clinical_classification_BIC variant chr17:43074347:CAAGT>C major change: Class 5 - Clinical_classification_BIC variant chr17:43074427:C>T major change: - Pending Clinical_classification_BIC variant chr17:43074489:TC>T major change: Class 5 - Clinical_classification_BIC variant chr17:43076488:CTT>C major change: Class 5 - Clinical_classification_BIC variant chr17:43076578:ATAG>AAA major change: Class 5 - Clinical_classification_BIC variant chr17:43082460:C>CT major change: - Class 5 Clinical_classification_BIC variant chr17:43082508:AAC>A major change: Class 5 - Clinical_classification_BIC variant chr17:43082564:GGT>G major change: Class 5 - Clinical_classification_BIC variant chr17:43090999:CTT>C major change: Class 5 - Clinical_classification_BIC variant chr17:43091005:TCA>T major change: Class 5 - Clinical_classification_BIC variant chr17:43091007:ACT>A major change: Class 5 -

melissacline commented 8 years ago

And in the same package, the Germline_or_Somatic_BIC field: Germline_or_Somatic_BIC variant chr17:43106472:TC>T major change: G - Germline_or_Somatic_BIC variant chr17:43106488:CT>C major change: G - Germline_or_Somatic_BIC variant chr17:43106523:GC>G major change: G - Germline_or_Somatic_BIC variant chr17:43115729:CA>C major change: G - Germline_or_Somatic_BIC variant chr17:43115746:CTT>C major change: G - Germline_or_Somatic_BIC variant chr17:43115775:CCA>C major change: G - Germline_or_Somatic_BIC variant chr17:43115789:TA>T major change: G -

zfisch commented 8 years ago

I do see these changes in the built.tsv and aggregated.tsv files you sent to me, but after creating those same files myself, I see no changes for Germline_or_Somatic_BIC or Clinical_classification_BIC. I made some changes to variant-merging.py and brca_pseudonym_generator.py that may have resolved those issues.

zfisch commented 8 years ago

Please disregard my previous comment -- i see that aggregated.tsv is from the old dataset and built.tsv is from the new dataset. Interestingly, running releaseDiff.py against the new aggregated.tsv and built.tsv found several clinvar classification changes, e.g.: Clinical_Significance_ClinVar variant chr17:43051061:ACCT>AATGTTG major change: - Pathogenic. Will explore further.

zfisch commented 8 years ago

Some updates:

Regarding Germline_or_Somatic_BIC

After tracking chr17:43106472:TC>T all the way back to the bic vcf file, it looks like 2 variants at position 43106472 get merged into a single variant at position 43106471 and then make their way into built.tsv as chr17:43106470:A>AT with the expected Germline_or_Somatic_BIC property of G. Meanwhile, chr17:43106472:TC>T looks like it was derived from Clinvar and does not have a Germline_or_Somatic_BIC property anywhere from the original vcf all the way through the merging process. There are 2 separate variants in the new built.tsv file, one from BIC and one from ClinVar.

In the old data, chr17:43106470:A>AT does not exist, but chr17:43106472:TC>T is a single variant derived from both BIC and ClinVar.

It seems that every variant listed above is an example of a variant that was being merged in the old data but is not merged in the new data. I don't have enough information to know if any of these variants should be merged or not.

Regarding Clinical_classification_BIC

It looks like this is the same issue as with G_or_S. Essentially, variants that were merged in the old data are no longer merged.

Proposed Action

Review when merges should and shouldn't happen and make any necessary adjustments.