Understanding ichorCNA output

CuriusScientist commented 4 years ago

Dear All,

I ran the samples through ichorCNA after tuning parameters from https://github.com/broadinstitute/ichorCNA/wiki/Parameter-tuning-and-settings using the following script

Rscript runIchorCNA.R \ --id tumor_sample \ --WIG tumor.wig \ --ploidy "c(2)" \ --normal "c(0.95, 0.99, 0.995, 0.999,0.9999)" \ --maxCN 4 \ --gcWig gc_hg19_1000kb.wig \ --mapWig map_hg19_1000kb.wig \ --centromere GRCh37.p13_centromere_UCSC-gapTable.txt \ --normalPanel HD_ULP_PoN_1Mb_median_normAutosome_mapScoreFiltered_median.rds \ --chrs "c(1:22)" \ --chrTrain "c(1:22)" \ --estimateScPrevalence FALSE \ --scStates "c()" \ --txnE 0.9999 \ --txnStrength 10000 --outDir ./

SCREENSHOT 2019-11-14 AT 14 04 12 (1)

From the wiki https://github.com/broadinstitute/ichorCNA/wiki/Output "The colour of each data point corresponds to the estimated integer copy number. The colour mapping is:

1 copy = dark green 2 copies = blue 3 copies = brown 4+ copies = red

The segment medians are also plotted as horizontal lines with the same colour as the event itself if it is predicted to be clonal. A light green segment represents a subclonal prediction. The estimated tumor fraction and ploidy is printed at the top of the plot as well."

But nothing is mentioned about light green dots.

This problem is also discussed on GitHub page "Confusion about light green or dark green lines/dots #43" but never got resolved

Moreover, I am also not able to understand why in a single segment I can see more than one colour dots showing that same segment has both normal and amplification or deletion

The segment file looks like this

chrom	start	end	num.mark	seg.median.logR	copy.number	call	subclone.status	logR_Copy_Number	Corrected_Copy_Number	Corrected_Call
1	1000001	248000000	247	0.0077400588943692	3	GAIN	FALSE	3.06143116897855	3	GAIN
2	1000001	243000000	242	-0.001246488515328	2	NEUT	FALSE	2.15459242259098	2	NEUT
3	1000001	197000000	196	-0.00499316753651675	2	NEUT	FALSE	1.77817773161814	2	NEUT
4	2000001	191000000	189	0.00244536809430688	2	NEUT	FALSE	2.52645674145232	2	NEUT
5	1000001	180000000	179	0.00744286371594607	2	NEUT	FALSE	3.03135059396495	2	NEUT
6	1000001	170000000	169	0.00146722085658406	2	NEUT	FALSE	2.42783963385733	2	NEUT
7	1000001	159000000	158	0.00134170638802669	2	NEUT	FALSE	2.4151900658692	2	NEUT
8	1000001	145000000	144	0.000884372060065001	2	NEUT	FALSE	2.36910842049991	2	NEUT
9	1000001	141000000	140	-0.0124807547987305	2	NEUT	FALSE	1.02885067719508	2	NEUT
10	1000001	135000000	134	-0.00332829745933091	2	NEUT	FALSE	1.94532026589217	2	NEUT
11	2000001	134000000	132	0.0164900043173607	4	AMP	FALSE	3.94983845967904	4	AMP
12	1000001	132000000	131	-0.00333262554445345	2	NEUT	FALSE	1.94488550312822	2	NEUT
13	20000001	114000000	94	0.0142682539543918	4	AMP	FALSE	3.72374708309861	4	AMP
14	21000001	106000000	85	-0.0118445262856984	2	NEUT	FALSE	1.09237074141993	2	NEUT
15	25000001	102000000	77	-0.0192455518454588	2	NEUT	FALSE	0.355193407471677	2	NEUT
16	1000001	90000000	89	-0.0285307371286201	1	HETD	FALSE	0.015625	1	HETD
17	1000001	81000000	80	-0.0133593560786317	2	NEUT	FALSE	0.941178532581174	2	NEUT
18	1000001	78000000	77	0.01037009195391	2	NEUT	FALSE	3.32789988745343	2	NEUT
19	1000001	59000000	58	-0.0313714036641306	1	HETD	FALSE	0.015625	1	HETD
20	1000001	61000000	60	-0.00227657277452278	2	NEUT	FALSE	2.05100629388458	2	NEUT
21	16000001	48000000	32	0.0191431204975555	2	NEUT	FALSE	4.22028334056463	2	NEUT
22	22000001	50000000	28	0.00436703788877074	2	NEUT	FALSE	2.72039490613585	2	NEUT

and params.txt file looks like this

Gender: unknown Tumor Fraction: 0.0138 Ploidy: 2.28 Subclone Fraction: NA Fraction Genome Subclonal: 0 Fraction CNA Subclonal: 0 Coverage: NA ChrY coverage fraction: NA Student's t mean: -0.0089, -0.0019, 0.0049, 0.012 Student's t precision: 1000, 910, 920, 1200 Gamma Rate Init: 0.0011 GC-Map correction MAD: 0.06234

init n_est phi_est BIC Frac_genome_subclonal Frac_CNA_subclonal loglik n0.95-p2 0.96 1.983 NA 0 0 3751 n0.99-p2 0.98 2.126 NA 0 0 3886 n0.995-p2 0.98 2.208 NA 0 0 3916 n0.999-p2 0.99 2.28 NA 0 0 3936 n0.9999-p2 0.99 2.283 NA 0 0 3927

In my understanding, blue dots should be centred across 0, brown dots should be centred across 0.57, brown dots should be centred across 1 and green dots should be centred across -1

One copy gain = log2(3/2) = 0.57 (3 copies vs. 2 copies in reference)

One-copy loss = log2(1/2) = -1

Two-copy gain = log2(4/2) = 1

No loss or gain= log2(2/2) = 0

but I see a different behaviour

Lastly, in some other dataset, I get ploidy as 3 while I am using --ploidy "c(2)". Can someone throw light on that?

P.S. I have raised the issue at Google groups https://groups.google.com/a/broadinstitute.org/forum/?fromgroups&hl=en#!topic/ichorcna/wYddw8Nwegs but the group is inactive and so after waiting for a long time I am posting it here

lbeltrame commented 4 years ago

For what it's worth, I ended up reverting locally the commit which introduced this change as I was not able to interpret the plots anymore.

gavinha commented 4 years ago

Hi @CuriusScientist and @lbeltrame

Thank you for bringing up this issue. I had introduced a problem with the plotting in a recent commit. It only affects low tumor fraction samples. Briefly, the copy number correction step was being applied to low tumor fraction samples. I can provide a quick fix here to make sure not to do that for cases/solutions having < 0.05 TF.

Sorry for the inconvenience.

PS. I am only going to do minor fixes/patches on this repo. Most of the new features and development will pushed to https://github.com/GavinHaLab/ichorCNA

lbeltrame commented 4 years ago

@gavinha You might want to edit the README in this repository pointing to the new one just in case.

CuriusScientist commented 4 years ago

@gavinha thanks for that. have downloaded the latest version and will test it now

broadinstitute / ichorCNA

Understanding ichorCNA output #68