broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
164 stars 88 forks source link

CfDNA tumor fraction threshold for ichorCNA detection #16

Open thestarocean opened 6 years ago

thestarocean commented 6 years ago

Currently, I have been using ichorCNA for cfDNA analysis in HCC. However, results of several patients showed that their tumor fraction equal to 0. Is there a theoretical threshold for ichorCNA detection power? Other samples showed fraction near 0.04-0.08. I also read articles regarding CNV detection in cfDNA samples, with opinion showed that sequencing data with lower tumor fraction will only provide detection in larger bin size. It sounds pretty reasonable, so I am wondering if I could get the GC and mappability wig files for bin size 10m, could you please provide how these two kinds files are generated?

avilella commented 6 years ago

I am another user of ichorCNA, we have used it internally as a QC tool for our low coverage samples. What I noticed is that it will report values between 0.04-0.08 where we know they are negative controls, and they should not have a tumor fraction. See barchart below:

screen shot 2018-03-27 at 11 06 20

I haven't worried about this so far, as we consider it a low-level noise value. But if there are any parameters we can tweak to avoid this, I would be interested in testing it.

gavinha commented 6 years ago

Hi, From our publication in Nature Communications, we benchmarked ichorCNA using real data simulations that from 0.1X coverage data, we achieve 0.03 limit of detection. If you visit the wiki, you can read more that here: https://github.com/broadinstitute/ichorCNA/wiki/Interpreting-ichorCNA-results Please let me know if you have questions about this.
It would be great to carry this discussion on the Google Groups, especially if this may be helpful to the userbase. https://groups.google.com/a/broadinstitute.org/forum/?fromgroups&hl=en#!forum/ichorcna

For samples which you expect to have low tumor fraction (i.e. < 0.05), you can try the suggestions here: https://github.com/broadinstitute/ichorCNA/wiki/Parameter-tuning-and-settings

There are also a few considerations about post-processing that is done in the R script. ichorCNA generates multiple solutions. There are a set of filtering criteria to remove solutions, such as % of subclonal CNAs, % genome altered, etc. In the R script, you can set these arguments. Notice how some of these will cause some solutions to be set to zero TF.

--maxFracCNASubclone=MAXFRACCNASUBCLONE Exclude solutions with fraction of subclonal events greater than this value. Default: [0.7]

--maxFracGenomeSubclone=MAXFRACGENOMESUBCLONE Exclude solutions with subclonal genome fraction greater than this value. Default: [0.5]

--minSegmentBins=MINSEGMENTBINS Minimum number of bins for largest segment threshold required to estimate tumor fraction; if below this threshold, then will be assigned zero tumor fraction.

--altFracThreshold=ALTFRACTHRESHOLD Minimum proportion of bins altered required to estimate tumor fraction; if below this threshold, then will be assigned zero tumor fraction. Default: [0.05]

It is also possible that the correct solution was not the one chosen in the end. I encourage you to inspect the plots for all the solutions, especially if you are skeptical of the result.

With regards to 10Mb bin sizes, you can generate the gc and mappability wig files using simple tools from here: http://shahlab.ca/projects/hmmcopy_utils/

Hope this helps.

Best, Gavin

gavinha commented 6 years ago

Hi Albert,

For your issue of false positives, the parameters for increased sensitivity will probably not help. There are likely solutions in your runs which do predict 0 tumor fraction, but it probably selected the incorrect solution.

A solution is generated for each combination of initial normal and ploidy values. The solution with initialization normal=0.95 (i.e. tumor fraction = 0.05), is incorrectly being selected as optimal.

For your false positive samples, are you using chrTrain=c(1:22) and excluding chrX? Are there issues with chr19? That is, do you see a systematically lower coverage for chr19 across all your samples? If so, you may also want to exclude it as well.

What is the MAD value in the params.txt file? If it's > 0.15, then it may be too noisy.

Best, Gavin

thestarocean commented 6 years ago

Thanks, Gavin, that's very valuable information. Since several samples from positive group achieved 0.04-0.08 and all negative group all showed tumor fraction of 0, I can positively assume that those positive samples with tumor fraction of 0 were mainly due to their extremely low tumor fraction. I will try to adjust the parameter and build up 10Mb bin size for another run.

thestarocean commented 6 years ago

@avilella I think our situations differed, since my negative control samples did not show any tumor fraction (equal to 0), and my positive samples were close to 0.04. I think the tumor fraction were closely related to tumor type since all of my HCC samples showed similar range, different from the significantly higher tumor fraction I have read about in other type of cancer.

thestarocean commented 6 years ago

Adjusting the parameters seem to work fine. However, when I try to use 10MB bin size. The mappability wig file generated from both Alignability and Uniqueness tracks doesn't work along with the R script. The error message kept saying "invalid x for loess function". And the reason is that the generated mappability file contained no value pass the mappability threshold (which is set to 0.9 as default according to utils.R). So Did I chose the wrong mappability tracks? Could you give me some hint about the mappability tracks you use? Or should I just bypass this by provide no mappability file?