broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
160 stars 87 forks source link

Downsampling method of ichorCNA benchmarking #63

Open yuanzhao0502 opened 4 years ago

yuanzhao0502 commented 4 years ago

I have a question about your benchmarking work. In your paper, you downsampled to the number of reads required to reach exactly 0.01, 0.02, …, 0.09, 0.10 tumor fraction at 0.1× coverage. In the equation, you use the reads number to control how much percentage to downsampled. I am wondering you do the downsampling the bam file by Picard or just use reads number which detected by hmmcopy to multiply the percentage. Because when I use Picard to downsample a bam file, I found it is difficult to control the final reads number very accurate. After that when I use ichorCNA to detect tumor purity, the result is not good. I guess the problem is that the downsampling part is not randomly enough to keep the tumor purity as our expectations. Could you give me some suggestions?

gavinha commented 4 years ago

Hi @yuanzhao0502

I used Picard DownSampleSam. I also made sure to downsample a BAM that has duplicates removed. The readCounter tool in the ichorCNA pipeline ignores duplicates and sampling duplicate reads was an issue.

Hope this helps. Gavin