d3b-center / OpenPedCan-analysis

The analysis repository for the Open Pediatric Cancer Project
https://d3b-center.github.io/OpenPedCan-analysis/
Other
15 stars 14 forks source link

Duplicates in `fusion-annoFuse.tsv.gz` file #558

Open jharenza opened 7 months ago

jharenza commented 7 months ago

What data file(s) does this issue pertain to?

fusion-annoFuse.tsv.gz

What release are you using?

v13-v15

Put your question or report your issue here.

Per the comment below, there are fusions duplicated in the fusion-annoFuse.tsv.gz OPC release file. The only difference seems to be the Gene1A_anno. I just started using this file to derive the putative-oncogenic.tsv as of v13, so I am not sure if this has been happening since the beginning or not since it was not used prior to this. I am also not sure the extent of this (how many fusions it affects) and whether this is happening at a patient level or somehow upon merge. I suspect this may be occurring at a patient level since I do not think there is any analysis prior to merge. The arriba file for the patient below only has one entry for this fusion. Can someone investigate the cause of the gene annotation duplicate rows?

@jharenza , not sure what to do about this. In the fusions file, there are repeat entries that are slightly different. For example: BS_0HW7W7SD NSD3--TRIP12 8:38347497 2:229785855 NA NA NSD3 NA TRIP12 NA CosmicCensus, Oncogene NA NA NA NA ARRIBA 1 FALSE PT_C73C5BBZ [INTERCHROMOSOMAL[chr8--chr2]], translocation Genic in-frame BS_0HW7W7SD NSD3--TRIP12 8:38347497 2:229785855 NA NA NSD3 NA TRIP12 NA TumorSuppressorGene NA NA NA NA ARRIBA 1 FALSE PT_C73C5BBZ [INTERCHROMOSOMAL[chr8--chr2]], translocation Genic in-frame Exact same call from same caller, but it seems Gene1A_anno is somehow different...the cbio validation script reports 810 instances of this.

Originally posted by @migbro in https://github.com/d3b-center/bixu-tracker/issues/2248#issuecomment-1985959937

Duplicate of https://github.com/d3b-center/bixu-tracker/issues/2325

jharenza commented 2 months ago

jira ticket: https://d3b.atlassian.net/browse/BIXU-2325

jharenza commented 2 months ago

this is fixed in https://github.com/d3b-center/annoFuseData/tree/v1.0.0, we will need to update the docker image and rerun annofuse annotation for the OPC