d3b-center / OpenPedCan-analysis

The analysis repository for the Open Pediatric Cancer Project
https://d3b-center.github.io/OpenPedCan-analysis/
Other
15 stars 13 forks source link

`cnv-controlfreec-tumor-only.tsv.gz` status field capitalized but other CNV files lowercase #557

Open jharenza opened 4 months ago

jharenza commented 4 months ago

What data file(s) does this issue pertain to?

cnv-controlfreec-tumor-only.tsv.gz

What release are you using?

v13-v15

Put your question or report your issue here.

The status column in the freec tumor only cnv file has capitalized values but the values in the t/n files are all lowercase. This will cause an issue in all modules which use a case-sensitive status, as well as the pedcbio load, so should be updated and release modules rerun.

> table(freec_tumor_only$status)

   Gain    Loss Neutral 
 158089   63256  120994 

> table(freec$status)

   gain    loss neutral 
2159530  923876  577559 
jharenza commented 4 months ago

The CNV files for tumor only were merged in the hope project, but the original files (eg) have lowercase status. It is unclear where the caps came from. We can ask @zhangb1's team to do a new merge for this cohort.

Furthermore, this file as well as snv-mutect2-tumor-only-plus-hotspots.maf.tsv.gz seem to contain all hope samples, instead of the intended tumor only samples.

jharenza commented 2 weeks ago

Noting here, I think we may also need to assign amplification and deep deletion to freec calls