broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
160 stars 87 forks source link

Qustions about bins size and coverage #42

Open yuanzhao0502 opened 6 years ago

yuanzhao0502 commented 6 years ago

I am sorry to ask some basic questions:

  1. Is there some information about the sample coverage report? I found the parameter file from the output file but coverage value is NA. What's the meaning of that? I have tried different bin sizes (10kb,50kb,500kb,1mb) and got the same result. I use another R package "pasillaBamSubset" got the coverage of my sample is around 0.5.
  2. I don't know which bins size should I choose. Because for different bins sizes, it reported different tumor fraction. (10kb-0.27,50kb-0.15,500kb-0.12,1mb-0.05) Thanks for your reading and reply.
gavinha commented 5 years ago

Hi @yuanzhao0502

  1. The coverage field in the parameter file is hard-coded to report what you specify as coverage to the script. It is a legacy feature and not used in the snakemake pipeline. To determine the coverage, you need to compute this yourself - as you have done.

  2. You should not use any bin size smaller than 500kb if you do not have a matched normal sample. If you use smaller bins, then germline CNVs can confound the tumor fraction estimation. It is usually better to simply use 1Mb bins because it offers the cleanest normalized read coverage signals for estimating the tumor fraction. I am not sure why you are seeing the discrepancy between 1Mb and 500kb. Without having seen the results/plots, I would probably tend to trust the 1Mb results.

Hope this helps. Gavin