ccf histogram calculation

rbonneville commented 5 years ago

Hello, we are interested in testing PhylogicNDT with our own multi-sample sequencing data. How should we compute raw ccf histograms for clustering? Thank you.

alipsky commented 5 years ago

We are also interested in this , are you able to provide an example?

iglc commented 5 years ago

The easiest way is to run ABSOLUTE. We will push a tool that would autogenerate it if you have alt/ref counts and absolute copy-number data.

abenjak commented 5 years ago

To fix the generation of CCF histograms in ABSOLUTEv1.06, see https://github.com/broadinstitute/PhylogicNDT/issues/4#issuecomment-555588341

judithabk6 commented 4 years ago

I am facing the same issue. As far as I understood, this R script allows to compute the CCF histogram computation, but necessitates the multiplicity of the SNVs. Is it inferred in ABSOLUTE? How can it be assessed prior to subclonal reconstruction, in particularity for WES data?

jcha40 commented 4 years ago

Hello,

If you clone the latest update, you should be able to get PhylogicNDT to calculate the CCF input histograms without ABSOLUTE input by setting the --maf_input_type flag to calc_ccf. PhylogicNDT requires sample purity and local copy number for each mutation (column names local_cn_a1 and local_cn_a2) in order to calculate the CCF.

judithabk6 commented 4 years ago

Hi, Thank you for this update. I am not sure I am running it correctly though.

I try

/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/PhylogicNDT.py Cluster -i Test_Clust -s sample_01:input.maf::$purity:1 --maf_input_type calc_ccf

and here is the content of my input.maf file

Hugo_Symbol Chromosome  Start_position  Reference_Allele    Tumor_Seq_Allele2   t_ref_count t_alt_count local_cn_a1 local_cn_a2
Unknown chr1    5   T   G   23.0    36.0    1.0 0.0
Unknown chr1    15  C   T   55.0    7.0 1.0 0.0
Unknown chr1    25  T   G   162.0   65.0    5.0 4.0
Unknown chr1    35  T   G   113.0   62.0    1.0 1.0
Unknown chr1    45  T   C   45.0    25.0    2.0 2.0
Unknown chr1    55  C   T   96.0    12.0    2.0 1.0
Unknown chr1    65  T   C   27.0    7.0 2.0 1.0
Unknown chr1    75  T   G   89.0    4.0 1.0 1.0
Unknown chr1    85  T   G   115.0   12.0    2.0 2.0

and the purity is provided

echo $purity
0.6354575886305256

I get the error

Traceback (most recent call last):
  File "/data/users/jabecass/dl_tools_centos/PhylogicNDT/PhylogicNDT.py", line 515, in <module>
    args.func(args)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool
    purity=purity)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Patient.py", line 148, in addSample
    purity=purity, timepoint_value=timepoint_value)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 92, in __init__
    _additional_muts=_additional_muts)  # a list of SomMutation objects
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 129, in _load_sample_ccf
    mut_with_ccf_dat = self._read_ccf_from_txt(filen)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 324, in _read_ccf_from_txt
    raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !")
ValueError: Number of CCF bins values read for this variant are less than 101 bins !

ahgillmo commented 4 years ago

Hello, I am also trying to calculate the CCF using phylogicNDT and I am getting the same error as @judithabk6 . My command and input files are below:

python ../../PhylogicNDT.py Cluster -i SMP8_P1_1CNVGuessTest -s SMP8P1_1:SMP8_P1_1_input.Phylogic.txt::0.89:0 --maf_input_type calc_ccf

$head SMP8_P1_1_input.Phylogic.txt

Hugo_Symbol	Chromosome	Start_position	Reference_Allele	Tumor_Seq_Allele2	t_ref_count	t_alt_count	local_cn_a1	local_cn_a2
LARP4B	chr10	813092	T	-	14	2	2	1.4945
PITRM1	chr10	3157019	A	C	20	4	2	1.4945
AKR1C3	chr10	5099336	A	-	16	2	2	1.4945
PRKCQ	chr10	6483444	-	A	15	2	2	1.4945
DHTKD1	chr10	12094299	-	C	14	3	2	1.4945
VIM	chr10	17230636	C	T	21	3	2	1.4945
ST8SIA6	chr10	17327015	-	A	13	3	2	1.4945
SKIDA1	chr10	21516039	A	G	16	2	2	1.4945
SKIDA1	chr10	21516734	-	G	21	5	2	1.4945

The error is File "../../PhylogicNDT.py", line 515, in <module> args.func(args) File "/home/ahgillmo/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool purity=purity) File "/home/ahgillmo/PhylogicNDT/data/Patient.py", line 148, in addSample purity=purity, timepoint_value=timepoint_value) File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 92, in __init__ _additional_muts=_additional_muts) # a list of SomMutation objects File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 129, in _load_sample_ccf mut_with_ccf_dat = self._read_ccf_from_txt(filen) File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 324, in _read_ccf_from_txt raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !") ValueError: Number of CCF bins values read for this variant are less than 101 bins !

Any comments or ideas would be appreciated, Aaron

judithabk6 commented 4 years ago

@ahgillmo I have provided a solution for that and submitted a pull request (https://github.com/broadinstitute/PhylogicNDT/pull/38) that has not been accepted yet, but you can patch your local code with the same modification HTH

ahgillmo commented 4 years ago

Great, that seems to have fixed my problem! Thank you.

Oufra commented 2 years ago

Hello,

I am struggling with the same issue that @judithabk6 and @ahgillmo described, but the fix provided by @judithabk6 doesn't solve it. I'm working on exome data.

My command is as follow :

'singularity exec phylogicndt.sif PhylogicNDT.py Cluster -i D522R01 - s D522R01:/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/data/ITH_inputs/phylogicNDT_inputs/D522R01_phylogicndt.maf:/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/data/ITH_inputs/phylogicNDT_inputs/D522R01_phylogicndt_cnv.tsv:0.718965360764431:1 --maf_input_type calc_ccf

Here's how my .maf input looks like (first ten lines) :

Hugo_Symbol	Chromosome	Start_position	Reference_Allele	Tumor_Seq_Allele2	t_ref_count	t_alt_count	local_cn_a2	local_cn_a1
NBPF2P	chr1	21423172	G	C	20	3	1	1
MDS2	chr1	23627383	C	G	81	46	1	1
LDLRAP1	chr1	25565104	G	A	314	174	1	1
HIVEP3	chr1	41506413	T	A	16	7	1	1
WNT2B	chr1	112526263	ACCACCAGTACCATGTG	-	20	2	1	1
GPR161	chr1	168096935	A	T	573	229	2	1
BATF3	chr1	212686503	G	A	40	11	2	1
ENAH	chr1	225511728	TGGGGAAAGGGGGAATTTTTAAC	-	19	2	2	1
MTR	chr1	236884979	G	T	32	4	2	1

And after applying the fix proposed by @judithabk6 , here's how my Cluster.py file looks (lines 72 to 76)

            patient_data.addSample(maf_fn, sample_id, timepoint_value=timepoint, grid_size=arg$
                                   _additional_muts=None,
                                   seg_file=seg_fn,
                                   purity=purity, input_type=args.maf_input_type)

But I still get the following error (same as before the fix)

Traceback (most recent call last):
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt//PhylogicNDT.py", line 515, in <module>
    args.func(args)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/Cluster/Cluster.py", line 75, in run_tool
    purity=purity)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Patient.py", line 148, in addSample
    purity=purity, timepoint_value=timepoint_value)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 92, in __init__
    _additional_muts=_additional_muts)  # a list of SomMutation objects
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 129, in _load_sample_ccf
    mut_with_ccf_dat = self._read_ccf_from_txt(filen)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 324, in _read_ccf_from_txt
    raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !")
ValueError: Number of CCF bins values read for this variant are less than 101 bins !

I get the same error on all of the ~ 500 exomes I'm analyzing.

Am I missing something ?

miachom commented 2 years ago

Hi, I am working on WGS data and also getting the same error after the changes in Cluster.py Has anyone got a solution to this? Thanks

Oufra commented 2 years ago

Hi @miachom. I ended up using a .sif file instead of a command line-defined design, and it worked.

broadinstitute / PhylogicNDT

ccf histogram calculation #10