gavinha / TitanCNA

Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH) in cancer
GNU General Public License v3.0
92 stars 36 forks source link

homozygous deletion (CDKN2A) being classified as HLAMP #56

Closed fpbarthel closed 5 years ago

fpbarthel commented 5 years ago

Hi Gavin,

Apologies for raising so many issues. Completely understand if you don't have time to address everything. I hope that I contribute to improving this excellent software in a way.

I've found some samples that show a homozygous deletion in CDKNA, but that TITAN classifies as HLAMP, despite getting the corrected_copy_number and logr_copy_number correct. Do you know why this could be happening?

pair_barcode chrom pos num_snp median_ratio median_logr titan_state titan_call copy_number minor_cn major_cn clonal_cluster cellular_prevalence logr_copy_number corrected_copy_number corrected_call
xxx 9 [21900599,21997016) 26 0.854816 -0.778442 20 ALOH 8 0 8 3 0.21310998 0.015625 0 HLAMP
TCGA-14-1402 9 [211762,34639475) 22534 0.862745 -0.770525 20 ALOH 8 0 8 1 0.99649755 0.24458808 0 HLAMP
xxx 9 [334016,35884107) 690 0.871795 -0.68648 20 ALOH 8 0 8 3 0.42151919 0.55352253 1 HLAMP

The second sample is a TCGA sample and I can share it as needed.

Floris

P.s. This issue raises another one in my mind: how good is TITAN at detecting homozygous deletions, as are common in eg. CDKN2A? I wonder because I imagine that the read depth at the normal het site in the matching tumors in this area may not meet the minimum threshold set by TITAN. Any thoughts on this?

gavinha commented 5 years ago

Hi @fpbarthel

I want to thank you for your continued interest in TitanCNA and helping to bring some of these issues to my and the community's attention.

In the original TitanCNA run, prior to correction of copy number, this is an issue of label-switching. Usually, solutions that show something like this should have much lower likelihoods but sometimes these solutions creep through. The idea behind why this is happening is that these data points do not fall within the solution's estimated parameters. For example, if the tumor content is estimated to be low but these data points have a stronger signal than is expected based on this tumor content, then the data doesn't necessarily fit in the HOMD log ratio distribution.

Although, I wonder if there is an alternative solution that does not this problem.

The correctIntegerCN should fix this and it has since corrected_copy_number is 0, but it appears to be a bug that corrected_call is not using the correct text of HOMD.

I'll look into this and fix this soon.

Thanks, Gavin