Open zhouxuzhouxu opened 3 years ago
Hi @zhouxuzhouxu,
Not an author of CNVkit, but could you please precise what kind of "decrepancies" you are facing? => Are Copy Number values lower than expected from your array data? Higher? Or both? => Also is it on a particular gene? Or on several? => And what is the magnitude of your decrepancies? Are we talking about twice the expected value? More? Less?
I see in your commands (thx for detailing them BTW) that you used simplest parameters for your call
step ?
=> If you haven't yet, I advice you to read CNVkit's documentation about Tumor Analysis and details about cnvkit.py call
=> To sum up, call
subcommand have several parameters to adjust copy_ratio values, knowing for example your tumor purity
Hope this helps. Have a nice day. Felix.
Hi Felix, Thanks for your quick response. In fact, I used simplest parameters for call step as shown above and I got copy number of over 20000 genes in 30 tumor cell lines with unmatched normal. Compared with the copy number of these cell line from array data, the proportion of genes with the same copy number was more than 80% (number of genes with same copy number/overlapped genes) in half of the cell lines, however, In the other half of the cell lines, the proportion of genes with the same copy number was less than 5%. These results seem to suggest that it is not stable. I have also read the documentation and do not know what to do. Do you have any suggestions? In addition, would with no control samples affect the final result? Thank you for your time and consideration.
Best, XuZhou
Hi @zhouxuzhouxu,
First could you please edit your last response to make it more "normal" (and more readable) => I think you should remove the "`" characters you put ! Simply write in plain text, it should be better
Best, Felix
Hi Felix, I've removed the space at the beginning of the line. The format is normal.
Hi @zhouxuzhouxu! In addition to what @tetedange13 already said—
CNVkit performs best on matched tumour-normal data. The normal data are used to accurately factor in baseline coverage variance. The flat reference can be thought of as last resort method, which will indeed perform significantly worse than if you had normal reference.
CN estimation using microarrays has its own caveats and problems. So what may be happening is that you're comparing two relatively noisy methods of analysis. However, even in this case I wouldn't expect concordance as poor as having <5% of genes with matching copy numbers. So I'm willing to investigate this further.
Let's try a few things:
rmdup
in them, suggesting they went through duplicate removal step. This can sometimes interfere with depth estimations. Could you try re-running the whole workflow without duplicate removal and see if it improves the results? While we're at it, you also shouldn't filter the reads by mapping quality, because sometimes it can also degrade the results.Hi @zhouxuzhouxu, just checking in to see if you were able to do some of the suggestions from my previous comment, or if you have any further questions? I'll be happy to help, this issue indeed looks like something worth investigating
Hi @zhouxuzhouxu, just checking in to see if you were able to do some of the suggestions from my previous comment, or if you have any further questions? I'll be happy to help, this issue indeed looks like something worth investigating
Hi tskir, Thanks for your quickly response. I tried advice from you.
Best, Xu Zhou
Hi, I got geneLevel CN (copy number) by using cnvkit based on WES data (20 tumor samples unmatched normal). In result, I found a clear discrepancy between geneLevel CN from WES (cnvkit) and geneLevel CN from array snp6.0. Is my code used incorrectly? Here's my code: python3 cnvkit.py batch \ ${sample}.rmdup.bam \ --reference ${dataCNVkit}/FlatReference.cnn \ --output-dir ${dataCNVkit}/access_5k_Map/${sample}/
python3 ${cnvkit}cnvkit.py genemetrics \ ${sample}.rmdup.cnr \ -s ${sample}.rmdup.cns -t 0 -m 0 -y \ -o ${sample}.geneLevel.cns
python3 ${cnvkit}cnvkit.py call \ ${sample}.geneLevel.cns \ -o ${sample}.geneLevel.call.cns