AlexandrovLab / SigProfilerClusters

Tool for analyzing the inter-mutational distances between SNV-SNV and INDEL-INDEL mutations. Tool separates mutations into clustered and non-clustered groups on a sample-dependent basis.
BSD 2-Clause "Simplified" License
11 stars 1 forks source link

FileNotFoundError: [Errno 2] No such file or directory: './Allen_Pat02_ID/output/vcf_files_corrected/cancer_clustered/INDEL/output/ID/cancer_clustered.ID83.all' #9

Closed shishuo16 closed 2 years ago

shishuo16 commented 2 years ago

Hi, when I used SigProfilerClusters to generate clustered indel result using the code as below

import sys
from SigProfilerMatrixGenerator import install as genInstall
from SigProfilerSimulator import SigProfilerSimulator as sigSim
from SigProfilerClusters import SigProfilerClusters as hp
if __name__ == '__main__':
    sigSim.SigProfilerSimulator("cancer", "./Allen_Pat110_ID/", "GRCh37", contexts = ["ID"], simulations=100)
    hp.analysis("cancer", "GRCh37", "ID", ["ID"], "./Allen_Pat110_ID/", analysis="all", interdistance='ID', sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False)

I got error message in "SigProfilerClusters_cancer_GRCh37_2022-04-25.err" file:

Traceback (most recent call last): File "/share/home/shuo.shi/APOBEC/SigProfiler//script/cluster.ID.py", line 7, in hp.analysis(sys.argv[1], "GRCh37", "ID", ["ID"], sys.argv[2], analysis="all", interdistance='ID', sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False,exome=True) File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerClusters/SigProfilerClusters.py", line 590, in analysis regions, imds = hotspot.hotSpotAnalysis(project, genome, contexts, simContext, ref_dir, windowSize, processors, plotIMDfigure, exome, chromLengths, binsDensity, original, signature, percentage, firstRun, clustering_vaf, calculateIMD, chrom_based, correction) File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerClusters/hotspot.py", line 1151, in hotSpotAnalysis with open(ref_dir + "output/vcf_files" + path_suffix + "/" + project + "_clustered/INDEL/output/ID/" + project + "_clustered.ID83.all" ) as f: FileNotFoundError: [Errno 2] No such file or directory: './Allen_Pat110_ID/output/vcf_files_corrected/cancer_clustered/INDEL/output/ID/cancer_clustered.ID83.all'

And I also tried with the BRCA_example.vcf (https://osf.io/qpmzw/wiki/5.%20Quick%20Start%20Example/) with the same parameter setting like below

import sys
from SigProfilerMatrixGenerator import install as genInstall
from SigProfilerSimulator import SigProfilerSimulator as sigSim
from SigProfilerClusters import SigProfilerClusters as hp
if __name__ == '__main__':
    sigSim.SigProfilerSimulator("BRCA", "./example/", "GRCh37", contexts = ["ID"], simulations=100)
    hp.analysis("BRCA", "GRCh37", "ID", ["ID"], "./example/", analysis="all", interdistance='ID', sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False)

However, this time I got the error message in SigProfilerSimulator_BRCA_GRCh37_2022-04-26.err file as below:

Traceback (most recent call last): File "/share/home/shuo.shi/APOBEC/SigProfiler/cluster.ID.example.py", line 6, in sigSim.SigProfilerSimulator("BRCA", "./example/", "GRCh37", contexts = ["ID"], simulations=100) File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerSimulator/SigProfilerSimulator.py", line 395, in SigProfilerSimulator sample_names, mut_prep = simScript.mutation_preparation(catalogue_files, log_file) File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerSimulator/mutational_simulator.py", line 498, in mutation_preparation with open (catalogue_files[context]) as f: FileNotFoundError: [Errno 2] No such file or directory: './example/output/ID/BRCA.ID83.all'

Please let me know how to fix it ?

ebergstr commented 2 years ago

Hi,

This is caused by your sample(s) not having clustered mutations present. I have updated the tool to now include an output statement notifying the user that no clustered mutations (1.0.9). Please update your version the tool.

The example BRCA sample does not contain indels, which is why you are seeing this issue with that file.

Thanks!

shishuo16 commented 2 years ago

Hi, Thanks for your quick reply. And I updated SigProfilerClusters to version 1.0.9 as you suggested and run the code with an vcf contained indels. This time it seems work. However when I check the result file "output/clustered/ID/test.vcf", I can not find the cluster info.

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
X   15429073    test    T   CTC .   .   ClusteredAnalysis
X   15429355    test    G   ATC .   .   ClusteredAnalysis
X   16603461    test    C   GTC .   .   ClusteredAnalysis
X   16603493    test    C   ATC .   .   ClusteredAnalysis
X   16603885    test    C   GTC .   .   ClusteredAnalysis
X   16603954    test    C   GTC .   .   ClusteredAnalysis
X   16604434    test    C   TTC .   .   ClusteredAnalysis
X   16604571    test    C   TTC .   .   ClusteredAnalysis
X   16606097    test    C   GTC .   .   ClusteredAnalysis

How so?

shishuo16 commented 2 years ago

And, when I set exome=True, there is not notification for no clustered mutations. It will generate error message in SigProfilerClusters_cancer_GRCh37_2022-04-27.err file as below:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/hotspot.py", line 706, in calculateSampleIMDs
    regions = densityCorrection(densityMuts, densityMutsSim, windowSize)
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/hotspot.py", line 543, in densityCorrection
    sims = random.sample(list(densityMutsSim.keys()), 10)
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/random.py", line 321, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "../cluster.ID.py", line 9, in <module>
    hp.analysis(sys.argv[1], "GRCh37", "ID", ["ID"], sys.argv[2], analysis="all", interdistance='ID',sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False, exome=True)
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/SigProfilerClusters.py", line 669, in analysis
    regions, imds = hotspot.hotSpotAnalysis(project, genome, contexts, simContext, ref_dir, windowSize, processors, plotIMDfigure, exome, chromLengths, binsDensity, original, signature, percentage, firstRun, clustering_vaf, calculateIMD, chrom_based, correction)
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/hotspot.py", line 1058, in hotSpotAnalysis
    r.get()
  File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
ValueError: Sample larger than population or is negative
ebergstr commented 2 years ago

Hi,

I have run your example file; however, given the small number of events, there are no statistically significant clustered events present (the lower number of clustering could still happen by chance, therefore, they are not classified as confident clustered events). Thus, you will not expect to see any clustered info within the clustered/ID/test.vcf file. Additionally, if you expect clustered mutations, you also need to have subClassify=True to create these outputs.

The error that you are experiencing is also attributed to having too few mutations for this analysis. To help in running and understanding the indel analysis, I have now uploaded a new breast cancer sample that has only indels to the OSF wiki page. I recommend running this sample to see the expected output.

Please try the example file, and let us know if this still gives you issues.

Thanks.

shishuo16 commented 2 years ago

Hi, I used the BRCA-EU_SP117344.indel.vcf file and the code as below, in which subClassify=True. However the result file "example/output/clustered/ID/BRCA-EU_SP117344.vcf" still not contained cluster info (groupNumber info). Ps. "ID/BRCA-EU_SP117344.vcf" is the only one file under the "example/output/clustered/".

import sys
from SigProfilerMatrixGenerator import install as genInstall
from SigProfilerSimulator import SigProfilerSimulator as sigSim
from SigProfilerClusters import SigProfilerClusters as hp
if __name__ == '__main__':
    sigSim.SigProfilerSimulator("cancer", "./example/", "GRCh37", contexts = ['ID'], simulations=100)
    hp.analysis("cancer", "GRCh37", "ID", ["ID"], "./example/", analysis="all", interdistance='ID',sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False)

head 10 lines of "example/output/clustered/ID/BRCA-EU_SP117344.vcf" file:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
X   13755496    BRCA-EU_SP117344    GTT G   .   .   ClusteredAnalysis
X   13756920    BRCA-EU_SP117344    ACTGATATAAAAATATGTCTACTTAGTAACCTATTTTTTCTTAAGGTATTTACTATGCAGGATCTATTACAACTCATTAAAATCAACCCTACTTCCAGT A   .   .   ClusteredAnalysis
X   48399674    BRCA-EU_SP117344    ATGGTGATGGGGCCCC    A   .   .   ClusteredAnalysis
X   48400142    BRCA-EU_SP117344    GACTTAGGATTTTTCA    G   .   .   ClusteredAnalysis
1   193426800   BRCA-EU_SP117344    AT  A   .   .   ClusteredAnalysis
1   193427119   BRCA-EU_SP117344    GGAAAGACTACT    G   .   .   ClusteredAnalysis
2   17656445    BRCA-EU_SP117344    TCATATTATTTAAAAACATAAC  T   .   .   ClusteredAnalysis
2   17657570    BRCA-EU_SP117344    TATGG   T   .   .   ClusteredAnalysis
2   179799847   BRCA-EU_SP117344    CTT C   .   .   ClusteredAnalysis
ebergstr commented 2 years ago

Hi,

This is the expected output as we do not provide group numbers for clustered indels. I will close this issue now that you appear to be getting the expected output. Please feel free to reopen, or open a new issue if you experience additional issues.

Thank you! -Erik