AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
101 stars 67 forks source link

Proposed Analysis: Review 50% reciprocal overlap in CNV consensus analysis #1125

Closed kgaonkar6 closed 3 years ago

kgaonkar6 commented 3 years ago

What analysis module should be updated and why?

We believe criteria for cnvs to have 50% reciprocal overlap might be too stringent. After the #1116 we are missing some subtype defining CNVs.

What changes need to be made? Please provide enough detail for another participant to make the update.

After investigation we see the CNV alterations n controlfreec and cnkit . For example chr19 amplification in BS_K07KNTFY is seen in both controlfreec and cnvkit but missed out of consensus calls because cnvkit region is 11% of controlfreec region

BS_K07KNTFY.cnvkit.dup.filtered3.bed: chr19 54138551    54427104
BS_K07KNTFY.freec.dup.filtered3.bed:  chr19 53641020    56141391

What was your approach?

Our approach was to broaden the criteria to include CNV calls in either caller that has any overlap at this step master https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/2511e8c9fc4a7542f5b709363f866ccddb73be8b/analyses/copy_number_consensus_call/scripts/compare_variant_calling_updated.py#L148-L152

consensus-cnv-smallCNV-overlap

                        ## For list2's CNV
                        ## If any overlap exists,
                        ## then we add in the start, end coordinate, total overlap length, and total len to different lists
                        ## This is done to account for 1 CNV from list1 overlapping with MULTIPLE CNVs from list2
                      if (end - start +1) / (end_list2 - start_list2 + 1) >= 0:

And at the following snippet we allow CNV overlaps that completely overlap a smaller CNV in caller X by a larger CNV in callerY master https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/2511e8c9fc4a7542f5b709363f866ccddb73be8b/analyses/copy_number_consensus_call/scripts/compare_variant_calling_updated.py#L181

consensus-cnv-smallCNV-overlap

if (coverage_list1 >= 0.5 and coverage_list2 >= 0.5) or (coverage_list1 >=0.9 and coverage_list2 >0 ) or (coverage_list1>0 and coverage_list2 >=0.9):

What input data should be used? Which data were used in the version being updated?

pbta-cnv-cnvkit.seg.gz pbta-cnv-controlfreec.tsv.gz pbta-sv-manta.tsv.gz

When do you expect the revised analysis will be completed?

1day

Who will complete the updated analysis?

@kgaonkar6

kgaonkar6 commented 3 years ago

Closing with #1123