d3b-center / ticket-tracker-OPC

A repo to generate and track tickets for ped OT
2 stars 0 forks source link

Updated analysis: Fusion filtering #533

Closed migbro closed 1 year ago

migbro commented 1 year ago

What analysis module should be updated and why?

Assuming I understand correctly, this one: 05-QC_putative_onco_fusion_distribution.Rmd An attempt is made to collapse hits in which both callers called the same breakpoint, but fails. The code in question seems to be here: https://github.com/PediatricOpenTargets/OpenPedCan-analysis/blob/6e53918a1cf33c190082665df1ecf833efc96b23/analyses/fusion_filtering/05-QC_putative_onco_fusion_distribution.Rmd#L240-L250

What changes need to be made? Please provide enough detail for another participant to make the update.

Currently, some collapsing does occur, i.e.: BS_00FD2KMP 3:160356013 14:67809289 IFT80--ZFYVE26 in-frame IFT80 NA ZFYVE26 NA NA Oncogene NA NA NA STARFUSION, ARRIBA 2 NA NA FALSE PT_SW4Q1HZP [INTERCHROMOSOMAL[chr3--chr14]], translocation Genic However, other instance are incomplete:

BS_00FD2KMP     6:158818082     6:4431475       EZR--ENSG00000285424    frameshift      EZR     NA      ENSG00000285424 NA      CosmicCensus    NA      NA      NA      NA      ARRIBA  1       NA      NA      FALSE   PT_SW4Q1HZP     [INTRACHROMOSOMAL[chr6:154.18Mb]], deletion     Genic
BS_00FD2KMP     6:158818082     6:4431475       EZR--ENSG00000285424    other   EZR     NA      ENSG00000285424 NA      CosmicCensus    NA      NA      NA      NA      STARFUSION      1       NA      NA      FALSE   PT_SW4Q1HZP     [INTRACHROMOSOMAL[chr6:154.18Mb]]       Genic
BS_028YFYJ6     3:53846648      3:39169937      IL17RB--ENSG00000284669 frameshift      IL17RB  NA      ENSG00000284669 NA      Oncogene        NA      NA      NA      NA      ARRIBA  1       NA      NA      FALSE   PT_394ZA6P7     [INTRACHROMOSOMAL[chr3:14.67Mb]], duplication   Genic
BS_028YFYJ6     3:53846648      3:39169937      IL17RB--ENSG00000284669 other   IL17RB  NA      ENSG00000284669 NA      Oncogene        NA      NA      NA      NA      STARFUSION      1       NA      NA      FALSE   PT_394ZA6P7     [INTRACHROMOSOMAL[chr3:14.67Mb]]        Genic

You can see two example of the same breakpoint on different lines from the same caller, when the intent I believe is to merge those. The difference seems to be in Fusion_Type. Recommendation: _Merge these calls and use the Fusion_type assigned to ARRIBA`.

What input data should be used? Which data were used in the version being updated?

Whatever input is normally given for this step

When do you expect the revised analysis will be completed?

ASAP

Who will complete the updated analysis?

Unknown

migbro commented 1 year ago

@kelseykeith I think Jo Lynne assigned you to the wrong ticket, just made the adjustment. It is related to #509, bit this actually describes the work

jharenza commented 1 year ago

This may stem from kinases with multiple kinase domains in that one domain is retained and not another. We cross-posted here, but immediate fix will be to collapse these as "Yes,No" in the domain retained columns.