Open lbeltrame opened 5 years ago
Hi @lbeltrame
Is this a selected solution made by selectSolutions.R
? If so, then some changes in issue #61 might help.
Is this a single run with ploidy initialized to 2? If so, then can I also see genome wide LOH plot of allelic fractions?
Thanks, Gavin
Yes, it's made via selectSolutions
, from a run that tested ploidies 2,3,4and clusters from 1 to 5. The only difference is that bins come either from CNVkit or GATK CNV rather than readCounter
.
Here is the LOH plot (note that it is from a different run, because I was tweaking the segmentation parameters, but this occurs in all cases):
Note that another solution, which is not selected, is closer to the actual sample (mostly CN neutral, some LOH).
Thanks, @lbeltrame.
I think the commits associated with #61 might help.
Also, what value are you setting for --alphaK
in titanCNA.R
? For WES data, try using a value between 1000 and 3000.
Alternatively, you can adjust the --threshold
in selectSolutions.R
so that it weights the ploidy 2 solution more.
I'm using 2500 for --alphaK
as per the recommendations for WES. I found out that the original sample had a very noisy reference for CN, so I switched it and I'm trying again. I'll then try with the latest master to see if it improves.
I've had the opportunity to test this again. selectSolution
selects the only "wrong" solution (2 clusters, but reports only 1 as numClust
in the output) which has this shift (tested with the latest code). Other solutions (1 cluster, 3-5 clusters) all correctly report the whole genome as without CNAs.
From other evidence, the solution with one cluster seems the most correct biologically, but it has a slightly lower statistic than the one with 2, which gets picked.
I have a sample that is supposed to be mostly copy number neutral (no large events detected in a sWGS study). However, when I run the same sample (done with WES in this case, ~180X coverage) with TitanCNA using the output from GATK (sample vs its matched normal), I see graphs like this one for some solutions:
which at least in my case it is not correct. It looks like the "baseline" for the log2ratio has been shifted to around ~0.1-0.2. It doesn't help that the data is quite noisy.
Notice that I might be running an older version of TitanCNA (whatever is available in bioconda).