broadinstitute / ichorCNA

Estimating tumor fraction in cell-free DNA from ultra-low-pass whole genome sequencing.
GNU General Public License v3.0
160 stars 87 forks source link

Understanding ichorCNA output #68

Closed CuriusScientist closed 4 years ago

CuriusScientist commented 4 years ago

Dear All,

I ran the samples through ichorCNA after tuning parameters from https://github.com/broadinstitute/ichorCNA/wiki/Parameter-tuning-and-settings using the following script

Rscript runIchorCNA.R \ --id tumor_sample \ --WIG tumor.wig \ --ploidy "c(2)" \ --normal "c(0.95, 0.99, 0.995, 0.999,0.9999)" \ --maxCN 4 \ --gcWig gc_hg19_1000kb.wig \ --mapWig map_hg19_1000kb.wig \ --centromere GRCh37.p13_centromere_UCSC-gapTable.txt \ --normalPanel HD_ULP_PoN_1Mb_median_normAutosome_mapScoreFiltered_median.rds \ --chrs "c(1:22)" \ --chrTrain "c(1:22)" \ --estimateScPrevalence FALSE \ --scStates "c()" \ --txnE 0.9999 \ --txnStrength 10000 --outDir ./

SCREENSHOT 2019-11-14 AT 14 04 12 (1)

From the wiki https://github.com/broadinstitute/ichorCNA/wiki/Output "The colour of each data point corresponds to the estimated integer copy number. The colour mapping is:

1 copy = dark green 2 copies = blue 3 copies = brown 4+ copies = red

The segment medians are also plotted as horizontal lines with the same colour as the event itself if it is predicted to be clonal. A light green segment represents a subclonal prediction. The estimated tumor fraction and ploidy is printed at the top of the plot as well."

But nothing is mentioned about light green dots.

This problem is also discussed on GitHub page "Confusion about light green or dark green lines/dots #43" but never got resolved

Moreover, I am also not able to understand why in a single segment I can see more than one colour dots showing that same segment has both normal and amplification or deletion

The segment file looks like this

chrom start end num.mark seg.median.logR copy.number call subclone.status logR_Copy_Number Corrected_Copy_Number Corrected_Call
1 1000001 248000000 247 0.0077400588943692 3 GAIN FALSE 3.06143116897855 3 GAIN
2 1000001 243000000 242 -0.001246488515328 2 NEUT FALSE 2.15459242259098 2 NEUT
3 1000001 197000000 196 -0.00499316753651675 2 NEUT FALSE 1.77817773161814 2 NEUT
4 2000001 191000000 189 0.00244536809430688 2 NEUT FALSE 2.52645674145232 2 NEUT
5 1000001 180000000 179 0.00744286371594607 2 NEUT FALSE 3.03135059396495 2 NEUT
6 1000001 170000000 169 0.00146722085658406 2 NEUT FALSE 2.42783963385733 2 NEUT
7 1000001 159000000 158 0.00134170638802669 2 NEUT FALSE 2.4151900658692 2 NEUT
8 1000001 145000000 144 0.000884372060065001 2 NEUT FALSE 2.36910842049991 2 NEUT
9 1000001 141000000 140 -0.0124807547987305 2 NEUT FALSE 1.02885067719508 2 NEUT
10 1000001 135000000 134 -0.00332829745933091 2 NEUT FALSE 1.94532026589217 2 NEUT
11 2000001 134000000 132 0.0164900043173607 4 AMP FALSE 3.94983845967904 4 AMP
12 1000001 132000000 131 -0.00333262554445345 2 NEUT FALSE 1.94488550312822 2 NEUT
13 20000001 114000000 94 0.0142682539543918 4 AMP FALSE 3.72374708309861 4 AMP
14 21000001 106000000 85 -0.0118445262856984 2 NEUT FALSE 1.09237074141993 2 NEUT
15 25000001 102000000 77 -0.0192455518454588 2 NEUT FALSE 0.355193407471677 2 NEUT
16 1000001 90000000 89 -0.0285307371286201 1 HETD FALSE 0.015625 1 HETD
17 1000001 81000000 80 -0.0133593560786317 2 NEUT FALSE 0.941178532581174 2 NEUT
18 1000001 78000000 77 0.01037009195391 2 NEUT FALSE 3.32789988745343 2 NEUT
19 1000001 59000000 58 -0.0313714036641306 1 HETD FALSE 0.015625 1 HETD
20 1000001 61000000 60 -0.00227657277452278 2 NEUT FALSE 2.05100629388458 2 NEUT
21 16000001 48000000 32 0.0191431204975555 2 NEUT FALSE 4.22028334056463 2 NEUT
22 22000001 50000000 28 0.00436703788877074 2 NEUT FALSE 2.72039490613585 2 NEUT

and params.txt file looks like this

Gender: unknown Tumor Fraction: 0.0138 Ploidy: 2.28 Subclone Fraction: NA Fraction Genome Subclonal: 0 Fraction CNA Subclonal: 0 Coverage: NA ChrY coverage fraction: NA Student's t mean: -0.0089, -0.0019, 0.0049, 0.012 Student's t precision: 1000, 910, 920, 1200 Gamma Rate Init: 0.0011 GC-Map correction MAD: 0.06234

init n_est phi_est BIC Frac_genome_subclonal Frac_CNA_subclonal loglik n0.95-p2 0.96 1.983 NA 0 0 3751 n0.99-p2 0.98 2.126 NA 0 0 3886 n0.995-p2 0.98 2.208 NA 0 0 3916 n0.999-p2 0.99 2.28 NA 0 0 3936 n0.9999-p2 0.99 2.283 NA 0 0 3927

In my understanding, blue dots should be centred across 0, brown dots should be centred across 0.57, brown dots should be centred across 1 and green dots should be centred across -1

One copy gain = log2(3/2) = 0.57 (3 copies vs. 2 copies in reference)

One-copy loss = log2(1/2) = -1

Two-copy gain = log2(4/2) = 1

No loss or gain= log2(2/2) = 0

but I see a different behaviour

Lastly, in some other dataset, I get ploidy as 3 while I am using --ploidy "c(2)". Can someone throw light on that?

P.S. I have raised the issue at Google groups https://groups.google.com/a/broadinstitute.org/forum/?fromgroups&hl=en#!topic/ichorcna/wYddw8Nwegs but the group is inactive and so after waiting for a long time I am posting it here

lbeltrame commented 4 years ago

For what it's worth, I ended up reverting locally the commit which introduced this change as I was not able to interpret the plots anymore.

gavinha commented 4 years ago

Hi @CuriusScientist and @lbeltrame

Thank you for bringing up this issue. I had introduced a problem with the plotting in a recent commit. It only affects low tumor fraction samples. Briefly, the copy number correction step was being applied to low tumor fraction samples. I can provide a quick fix here to make sure not to do that for cases/solutions having < 0.05 TF.

Sorry for the inconvenience.

PS. I am only going to do minor fixes/patches on this repo. Most of the new features and development will pushed to https://github.com/GavinHaLab/ichorCNA

lbeltrame commented 4 years ago

@gavinha You might want to edit the README in this repository pointing to the new one just in case.

CuriusScientist commented 4 years ago

@gavinha thanks for that. have downloaded the latest version and will test it now