d3b-center / OpenPedCan-analysis

The analysis repository for the Open Pediatric Cancer Project
https://d3b-center.github.io/OpenPedCan-analysis/
Other
18 stars 14 forks source link

Updated analysis: consensus CN - XY chrs have artifactual calls #589

Closed jharenza closed 4 months ago

jharenza commented 4 months ago

What analysis module should be updated and why?

consensus_wgs_plus_cnvkit_wxs_plus_freec_tumor_only.tsv.gz consensus CN - there are many XY genes which are showing as deleted in M/F because the chr is missing, not bc they are really deleted. Eg: EZHIP (X)

What changes need to be made? Please provide enough detail for another participant to make the update.

I speculate, but have not checked, that when we brought in GATK, we needed to adjust the CN based on sex/gender, such as we do with cnvkit, to match freec.

What input data should be used? Which data were used in the version being updated?

When do you expect the revised analysis will be completed?

Who will complete the updated analysis?

jharenza commented 4 months ago

upon further inspection, this is possibly being caused by cnvkit WXS samples. We can possibly swap freec if this is being handled better.

jharenza commented 4 months ago

upon further inspection, this is possibly being caused by cnvkit WXS samples. We can possibly swap freec if this is being handled better.

Indeed, the XY WXS calls from CNVkit have more artifacts than freec, so we will switch over to freec wxs for future releases. See PR #591

Created a ticket for the dev team on internal JIRA: https://d3b.atlassian.net/browse/BIXU-3701

From this ticket:

WXS generally do not have a WGS on which germline_sex_estimate is performed. This sex estimate was being used as input to the CNV calling (I think) for CNVkit, possibly also ControlFreeC. For samples without sex estimate, are we using reported gender? Can we investigate whether this will help with the WXS calls, aligning them more with ControlFreeC?

For example, there are 242 WXS bs ids belonging to females who have 0 copy number for SRY, a Y-specific gene and have an annotation of deep deletion, rather than having no annotation for this gene since it is not present because they are females. In the FreeC runs, this deletion is not present, which is the outcome we expect.wxs_female_bs_cnvkit_sry_del.txt

I noticed this when EZHIP (chr x) was deleted in many samples (and these were mainly males) in our consensus file/on pedcbio in OPC project, but not deleted in the pbta_all project, which uses freec only.