gavinha / TitanCNA

Analysis of subclonal copy number alterations (CNA) and loss of heterozygosity (LOH) in cancer
GNU General Public License v3.0
93 stars 36 forks source link

Ratios shifted upwards in CN segment plot #75

Open lbeltrame opened 5 years ago

lbeltrame commented 5 years ago

I have a sample that is supposed to be mostly copy number neutral (no large events detected in a sWGS study). However, when I run the same sample (done with WES in this case, ~180X coverage) with TitanCNA using the output from GATK (sample vs its matched normal), I see graphs like this one for some solutions:

Screenshot_20190725_141109

which at least in my case it is not correct. It looks like the "baseline" for the log2ratio has been shifted to around ~0.1-0.2. It doesn't help that the data is quite noisy.

Notice that I might be running an older version of TitanCNA (whatever is available in bioconda).

gavinha commented 5 years ago

Hi @lbeltrame

Is this a selected solution made by selectSolutions.R? If so, then some changes in issue #61 might help. Is this a single run with ploidy initialized to 2? If so, then can I also see genome wide LOH plot of allelic fractions?

Thanks, Gavin

lbeltrame commented 5 years ago

Yes, it's made via selectSolutions, from a run that tested ploidies 2,3,4and clusters from 1 to 5. The only difference is that bins come either from CNVkit or GATK CNV rather than readCounter.

Here is the LOH plot (note that it is from a different run, because I was tweaking the segmentation parameters, but this occurs in all cases):

Screenshot_20190726_091131

Note that another solution, which is not selected, is closer to the actual sample (mostly CN neutral, some LOH).

gavinha commented 5 years ago

Thanks, @lbeltrame.

I think the commits associated with #61 might help. Also, what value are you setting for --alphaK in titanCNA.R? For WES data, try using a value between 1000 and 3000.

Alternatively, you can adjust the --threshold in selectSolutions.R so that it weights the ploidy 2 solution more.

lbeltrame commented 5 years ago

I'm using 2500 for --alphaK as per the recommendations for WES. I found out that the original sample had a very noisy reference for CN, so I switched it and I'm trying again. I'll then try with the latest master to see if it improves.

lbeltrame commented 5 years ago

I've had the opportunity to test this again. selectSolution selects the only "wrong" solution (2 clusters, but reports only 1 as numClust in the output) which has this shift (tested with the latest code). Other solutions (1 cluster, 3-5 clusters) all correctly report the whole genome as without CNAs.

From other evidence, the solution with one cluster seems the most correct biologically, but it has a slightly lower statistic than the one with 2, which gets picked.