Open rbonneville opened 5 years ago
We are also interested in this , are you able to provide an example?
The easiest way is to run ABSOLUTE. We will push a tool that would autogenerate it if you have alt/ref counts and absolute copy-number data.
To fix the generation of CCF histograms in ABSOLUTEv1.06, see https://github.com/broadinstitute/PhylogicNDT/issues/4#issuecomment-555588341
I am facing the same issue. As far as I understood, this R script allows to compute the CCF histogram computation, but necessitates the multiplicity of the SNVs. Is it inferred in ABSOLUTE? How can it be assessed prior to subclonal reconstruction, in particularity for WES data?
Hello,
If you clone the latest update, you should be able to get PhylogicNDT to calculate the CCF input histograms without ABSOLUTE input by setting the --maf_input_type
flag to calc_ccf
. PhylogicNDT requires sample purity and local copy number for each mutation (column names local_cn_a1
and local_cn_a2
) in order to calculate the CCF.
Hi, Thank you for this update. I am not sure I am running it correctly though.
I try
/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/PhylogicNDT.py Cluster -i Test_Clust -s sample_01:input.maf::$purity:1 --maf_input_type calc_ccf
and here is the content of my input.maf file
Hugo_Symbol Chromosome Start_position Reference_Allele Tumor_Seq_Allele2 t_ref_count t_alt_count local_cn_a1 local_cn_a2
Unknown chr1 5 T G 23.0 36.0 1.0 0.0
Unknown chr1 15 C T 55.0 7.0 1.0 0.0
Unknown chr1 25 T G 162.0 65.0 5.0 4.0
Unknown chr1 35 T G 113.0 62.0 1.0 1.0
Unknown chr1 45 T C 45.0 25.0 2.0 2.0
Unknown chr1 55 C T 96.0 12.0 2.0 1.0
Unknown chr1 65 T C 27.0 7.0 2.0 1.0
Unknown chr1 75 T G 89.0 4.0 1.0 1.0
Unknown chr1 85 T G 115.0 12.0 2.0 2.0
and the purity is provided
echo $purity
0.6354575886305256
I get the error
Traceback (most recent call last):
File "/data/users/jabecass/dl_tools_centos/PhylogicNDT/PhylogicNDT.py", line 515, in <module>
args.func(args)
File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool
purity=purity)
File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Patient.py", line 148, in addSample
purity=purity, timepoint_value=timepoint_value)
File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 92, in __init__
_additional_muts=_additional_muts) # a list of SomMutation objects
File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 129, in _load_sample_ccf
mut_with_ccf_dat = self._read_ccf_from_txt(filen)
File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 324, in _read_ccf_from_txt
raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !")
ValueError: Number of CCF bins values read for this variant are less than 101 bins !
Hello, I am also trying to calculate the CCF using phylogicNDT and I am getting the same error as @judithabk6 . My command and input files are below:
python ../../PhylogicNDT.py Cluster -i SMP8_P1_1CNVGuessTest -s SMP8P1_1:SMP8_P1_1_input.Phylogic.txt::0.89:0 --maf_input_type calc_ccf
$head SMP8_P1_1_input.Phylogic.txt
Hugo_Symbol | Chromosome | Start_position | Reference_Allele | Tumor_Seq_Allele2 | t_ref_count | t_alt_count | local_cn_a1 | local_cn_a2 |
---|---|---|---|---|---|---|---|---|
LARP4B | chr10 | 813092 | T | - | 14 | 2 | 2 | 1.4945 |
PITRM1 | chr10 | 3157019 | A | C | 20 | 4 | 2 | 1.4945 |
AKR1C3 | chr10 | 5099336 | A | - | 16 | 2 | 2 | 1.4945 |
PRKCQ | chr10 | 6483444 | - | A | 15 | 2 | 2 | 1.4945 |
DHTKD1 | chr10 | 12094299 | - | C | 14 | 3 | 2 | 1.4945 |
VIM | chr10 | 17230636 | C | T | 21 | 3 | 2 | 1.4945 |
ST8SIA6 | chr10 | 17327015 | - | A | 13 | 3 | 2 | 1.4945 |
SKIDA1 | chr10 | 21516039 | A | G | 16 | 2 | 2 | 1.4945 |
SKIDA1 | chr10 | 21516734 | - | G | 21 | 5 | 2 | 1.4945 |
The error is
File "../../PhylogicNDT.py", line 515, in <module> args.func(args) File "/home/ahgillmo/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool purity=purity) File "/home/ahgillmo/PhylogicNDT/data/Patient.py", line 148, in addSample purity=purity, timepoint_value=timepoint_value) File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 92, in __init__ _additional_muts=_additional_muts) # a list of SomMutation objects File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 129, in _load_sample_ccf mut_with_ccf_dat = self._read_ccf_from_txt(filen) File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 324, in _read_ccf_from_txt raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !") ValueError: Number of CCF bins values read for this variant are less than 101 bins !
Any comments or ideas would be appreciated, Aaron
@ahgillmo I have provided a solution for that and submitted a pull request (https://github.com/broadinstitute/PhylogicNDT/pull/38) that has not been accepted yet, but you can patch your local code with the same modification HTH
Great, that seems to have fixed my problem! Thank you.
Hello,
I am struggling with the same issue that @judithabk6 and @ahgillmo described, but the fix provided by @judithabk6 doesn't solve it. I'm working on exome data.
My command is as follow :
'singularity exec phylogicndt.sif PhylogicNDT.py Cluster -i D522R01 - s D522R01:/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/data/ITH_inputs/phylogicNDT_inputs/D522R01_phylogicndt.maf:/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/data/ITH_inputs/phylogicNDT_inputs/D522R01_phylogicndt_cnv.tsv:0.718965360764431:1 --maf_input_type calc_ccf
Here's how my .maf input looks like (first ten lines) :
Hugo_Symbol | Chromosome | Start_position | Reference_Allele | Tumor_Seq_Allele2 | t_ref_count | t_alt_count | local_cn_a2 | local_cn_a1 |
---|---|---|---|---|---|---|---|---|
NBPF2P | chr1 | 21423172 | G | C | 20 | 3 | 1 | 1 |
MDS2 | chr1 | 23627383 | C | G | 81 | 46 | 1 | 1 |
LDLRAP1 | chr1 | 25565104 | G | A | 314 | 174 | 1 | 1 |
HIVEP3 | chr1 | 41506413 | T | A | 16 | 7 | 1 | 1 |
WNT2B | chr1 | 112526263 | ACCACCAGTACCATGTG | - | 20 | 2 | 1 | 1 |
GPR161 | chr1 | 168096935 | A | T | 573 | 229 | 2 | 1 |
BATF3 | chr1 | 212686503 | G | A | 40 | 11 | 2 | 1 |
ENAH | chr1 | 225511728 | TGGGGAAAGGGGGAATTTTTAAC | - | 19 | 2 | 2 | 1 |
MTR | chr1 | 236884979 | G | T | 32 | 4 | 2 | 1 |
And after applying the fix proposed by @judithabk6 , here's how my Cluster.py file looks (lines 72 to 76)
patient_data.addSample(maf_fn, sample_id, timepoint_value=timepoint, grid_size=arg$
_additional_muts=None,
seg_file=seg_fn,
purity=purity, input_type=args.maf_input_type)
But I still get the following error (same as before the fix)
Traceback (most recent call last):
File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt//PhylogicNDT.py", line 515, in <module>
args.func(args)
File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/Cluster/Cluster.py", line 75, in run_tool
purity=purity)
File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Patient.py", line 148, in addSample
purity=purity, timepoint_value=timepoint_value)
File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 92, in __init__
_additional_muts=_additional_muts) # a list of SomMutation objects
File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 129, in _load_sample_ccf
mut_with_ccf_dat = self._read_ccf_from_txt(filen)
File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 324, in _read_ccf_from_txt
raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !")
ValueError: Number of CCF bins values read for this variant are less than 101 bins !
I get the same error on all of the ~ 500 exomes I'm analyzing.
Am I missing something ?
Hi, I am working on WGS data and also getting the same error after the changes in Cluster.py Has anyone got a solution to this? Thanks
Hi @miachom. I ended up using a .sif file instead of a command line-defined design, and it worked.
Hello, we are interested in testing PhylogicNDT with our own multi-sample sequencing data. How should we compute raw ccf histograms for clustering? Thank you.