broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
158 stars 88 forks source link

minimum command-line execution with parameter defaults #4

Closed avilella closed 6 years ago

avilella commented 6 years ago

Hi all,

Now that most of the hg38 reference files are in place (thanks for adding them up), I wonder which is the minimum command-line to apply to a wig file created from an hg38 bam file without any extra knowledge about the sample. Currently, I am down to the command below which I gathered from the wiki, but I wonder if I can get rid of some of the parameters so that the default ones are picked up automatically:

$rscript $runichorcna --id $root --centromere $extdatadir/GRCh38.GCA_000001405.2_centromere_acen.txt --WIG $wigfile --includeHOMD False --estimateNormal True --estimatePloidy True --estimateScPrevalence True --scStates \"c(1,3)\" --txnE 0.9999 --txnStrength 10000 --normalPanel $extdatadir/HD_ULP_PoN_hg38_1Mb_median_normAutosome_median.rds --gcWig $extdatadir/gc_hg38_1000kb.wig --outDir $outdir
gavinha commented 6 years ago

Hi Albert,

Most of the command-line arguments you have listed are quite essential. I can try to give you an indication of which arguments I tend to use and need to use values other than the defaults.

$rscript $runichorcna --id $root \
--centromere $extdatadir/GRCh38.GCA_000001405.2_centromere_acen.txt \
--WIG $wigfile \
--includeHOMD False \ --estimateNormal True --estimatePloidy True \
--estimateScPrevalence True --scStates \"c(1,3)\" \
--txnE 0.9999 --txnStrength 10000 \
--normalPanel $extdatadir/HD_ULP_PoN_hg38_1Mb_median_normAutosome_median.rds \
--gcWig $extdatadir/gc_hg38_1000kb.wig --outDir $outdir
  1. --includeHOMD, --estimateNormal, --estimatePloidy arguments will likely be set as those default values and not necessary. If you are using smaller bin sizes such as 1kb to 100kb, it may be best to leave --includeHOMD (homozygous deletion state) as FALSE.
  2. --estimateScPrevalence and --scStates defaults to assuming 2 subclonal states but you may not necessarily want to estimate clonality so you could decide to turn this off but you will need to specify this directly.
  3. --txnE and --txnStrength can be used to adjust the level of segmentation. For example, --txnE 0.9999 --txnStrength 10000 will lead to fewer segments than --txnE 0.9999999 --txnStrength 10000000.
  4. I would recommend that you actually include these arguments: --normal "c(0.5,0.75,0.85,0.95)" --ploidy "c(2,3,4)". The reason is that if you have some prior knowledge of your sample's expected tumor fraction and ploidy, you can help the model selection a bit by initializing the range of values for each model. For example, if you are working with cfDNA and expect low tumor fractions, then use --normal "c(0.75,0.85,0.90,0.95)". If you are working with cell-lines, then --normal "c(0.01)".
    These are initial values and ichorCNA will still try to learn these parameters based on each value you provide and select the best solution. You can see all the ranked solutions in the output PDF plot *all_sols.pdf.

Finally, you may wish to simply modify the default values directly in the R script. We provide the R script as a convenience for you to get the analysis running but we encourage you to customize for your needs.

Hope this helps. Best, Gavin