broadinstitute / PhylogicNDT

Other
71 stars 39 forks source link

ccf histogram calculation #10

Open rbonneville opened 5 years ago

rbonneville commented 5 years ago

Hello, we are interested in testing PhylogicNDT with our own multi-sample sequencing data. How should we compute raw ccf histograms for clustering? Thank you.

alipsky commented 5 years ago

We are also interested in this , are you able to provide an example?

iglc commented 5 years ago

The easiest way is to run ABSOLUTE. We will push a tool that would autogenerate it if you have alt/ref counts and absolute copy-number data.

abenjak commented 5 years ago

To fix the generation of CCF histograms in ABSOLUTEv1.06, see https://github.com/broadinstitute/PhylogicNDT/issues/4#issuecomment-555588341

judithabk6 commented 4 years ago

I am facing the same issue. As far as I understood, this R script allows to compute the CCF histogram computation, but necessitates the multiplicity of the SNVs. Is it inferred in ABSOLUTE? How can it be assessed prior to subclonal reconstruction, in particularity for WES data?

jcha40 commented 4 years ago

Hello,

If you clone the latest update, you should be able to get PhylogicNDT to calculate the CCF input histograms without ABSOLUTE input by setting the --maf_input_type flag to calc_ccf. PhylogicNDT requires sample purity and local copy number for each mutation (column names local_cn_a1 and local_cn_a2) in order to calculate the CCF.

judithabk6 commented 4 years ago

Hi, Thank you for this update. I am not sure I am running it correctly though.

I try

/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/PhylogicNDT.py Cluster -i Test_Clust -s sample_01:input.maf::$purity:1 --maf_input_type calc_ccf

and here is the content of my input.maf file

Hugo_Symbol Chromosome  Start_position  Reference_Allele    Tumor_Seq_Allele2   t_ref_count t_alt_count local_cn_a1 local_cn_a2
Unknown chr1    5   T   G   23.0    36.0    1.0 0.0
Unknown chr1    15  C   T   55.0    7.0 1.0 0.0
Unknown chr1    25  T   G   162.0   65.0    5.0 4.0
Unknown chr1    35  T   G   113.0   62.0    1.0 1.0
Unknown chr1    45  T   C   45.0    25.0    2.0 2.0
Unknown chr1    55  C   T   96.0    12.0    2.0 1.0
Unknown chr1    65  T   C   27.0    7.0 2.0 1.0
Unknown chr1    75  T   G   89.0    4.0 1.0 1.0
Unknown chr1    85  T   G   115.0   12.0    2.0 2.0

and the purity is provided

echo $purity
0.6354575886305256

I get the error

Traceback (most recent call last):
  File "/data/users/jabecass/dl_tools_centos/PhylogicNDT/PhylogicNDT.py", line 515, in <module>
    args.func(args)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool
    purity=purity)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Patient.py", line 148, in addSample
    purity=purity, timepoint_value=timepoint_value)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 92, in __init__
    _additional_muts=_additional_muts)  # a list of SomMutation objects
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 129, in _load_sample_ccf
    mut_with_ccf_dat = self._read_ccf_from_txt(filen)
  File "/bioinfo/users/jabecass/dl_tools_centos/PhylogicNDT/data/Sample.py", line 324, in _read_ccf_from_txt
    raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !")
ValueError: Number of CCF bins values read for this variant are less than 101 bins !
ahgillmo commented 4 years ago

Hello, I am also trying to calculate the CCF using phylogicNDT and I am getting the same error as @judithabk6 . My command and input files are below:

python ../../PhylogicNDT.py Cluster -i SMP8_P1_1CNVGuessTest -s SMP8P1_1:SMP8_P1_1_input.Phylogic.txt::0.89:0 --maf_input_type calc_ccf

$head SMP8_P1_1_input.Phylogic.txt

Hugo_Symbol Chromosome Start_position Reference_Allele Tumor_Seq_Allele2 t_ref_count t_alt_count local_cn_a1 local_cn_a2
LARP4B chr10 813092 T - 14 2 2 1.4945
PITRM1 chr10 3157019 A C 20 4 2 1.4945
AKR1C3 chr10 5099336 A - 16 2 2 1.4945
PRKCQ chr10 6483444 - A 15 2 2 1.4945
DHTKD1 chr10 12094299 - C 14 3 2 1.4945
VIM chr10 17230636 C T 21 3 2 1.4945
ST8SIA6 chr10 17327015 - A 13 3 2 1.4945
SKIDA1 chr10 21516039 A G 16 2 2 1.4945
SKIDA1 chr10 21516734 - G 21 5 2 1.4945

The error is File "../../PhylogicNDT.py", line 515, in <module> args.func(args) File "/home/ahgillmo/PhylogicNDT/Cluster/Cluster.py", line 75, in run_tool purity=purity) File "/home/ahgillmo/PhylogicNDT/data/Patient.py", line 148, in addSample purity=purity, timepoint_value=timepoint_value) File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 92, in __init__ _additional_muts=_additional_muts) # a list of SomMutation objects File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 129, in _load_sample_ccf mut_with_ccf_dat = self._read_ccf_from_txt(filen) File "/home/ahgillmo/PhylogicNDT/data/Sample.py", line 324, in _read_ccf_from_txt raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !") ValueError: Number of CCF bins values read for this variant are less than 101 bins !

Any comments or ideas would be appreciated, Aaron

judithabk6 commented 4 years ago

@ahgillmo I have provided a solution for that and submitted a pull request (https://github.com/broadinstitute/PhylogicNDT/pull/38) that has not been accepted yet, but you can patch your local code with the same modification HTH

ahgillmo commented 4 years ago

Great, that seems to have fixed my problem! Thank you.

Oufra commented 2 years ago

Hello,

I am struggling with the same issue that @judithabk6 and @ahgillmo described, but the fix provided by @judithabk6 doesn't solve it. I'm working on exome data.

My command is as follow :

'singularity exec phylogicndt.sif PhylogicNDT.py Cluster -i D522R01 - s D522R01:/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/data/ITH_inputs/phylogicNDT_inputs/D522R01_phylogicndt.maf:/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/data/ITH_inputs/phylogicNDT_inputs/D522R01_phylogicndt_cnv.tsv:0.718965360764431:1 --maf_input_type calc_ccf

Here's how my .maf input looks like (first ten lines) :

Hugo_Symbol Chromosome Start_position Reference_Allele Tumor_Seq_Allele2 t_ref_count t_alt_count local_cn_a2 local_cn_a1
NBPF2P chr1 21423172 G C 20 3 1 1
MDS2 chr1 23627383 C G 81 46 1 1
LDLRAP1 chr1 25565104 G A 314 174 1 1
HIVEP3 chr1 41506413 T A 16 7 1 1
WNT2B chr1 112526263 ACCACCAGTACCATGTG - 20 2 1 1
GPR161 chr1 168096935 A T 573 229 2 1
BATF3 chr1 212686503 G A 40 11 2 1
ENAH chr1 225511728 TGGGGAAAGGGGGAATTTTTAAC - 19 2 2 1
MTR chr1 236884979 G T 32 4 2 1

And after applying the fix proposed by @judithabk6 , here's how my Cluster.py file looks (lines 72 to 76)

            patient_data.addSample(maf_fn, sample_id, timepoint_value=timepoint, grid_size=arg$
                                   _additional_muts=None,
                                   seg_file=seg_fn,
                                   purity=purity, input_type=args.maf_input_type)

But I still get the following error (same as before the fix)

Traceback (most recent call last):
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt//PhylogicNDT.py", line 515, in <module>
    args.func(args)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/Cluster/Cluster.py", line 75, in run_tool
    purity=purity)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Patient.py", line 148, in addSample
    purity=purity, timepoint_value=timepoint_value)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 92, in __init__
    _additional_muts=_additional_muts)  # a list of SomMutation objects
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 129, in _load_sample_ccf
    mut_with_ccf_dat = self._read_ccf_from_txt(filen)
  File "/data/kdi_prod/.kdi/project_workspace_0/1348/acl/11.00/bin/phylogicndt/data/Sample.py", line 324, in _read_ccf_from_txt
    raise ValueError("Number of CCF bins values read for this variant are less than 101 bins !")
ValueError: Number of CCF bins values read for this variant are less than 101 bins !

I get the same error on all of the ~ 500 exomes I'm analyzing.

Am I missing something ?

miachom commented 2 years ago

Hi, I am working on WGS data and also getting the same error after the changes in Cluster.py Has anyone got a solution to this? Thanks

Oufra commented 2 years ago

Hi @miachom. I ended up using a .sif file instead of a command line-defined design, and it worked.