Closed shishuo16 closed 2 years ago
Hi,
This is caused by your sample(s) not having clustered mutations present. I have updated the tool to now include an output statement notifying the user that no clustered mutations (1.0.9). Please update your version the tool.
The example BRCA sample does not contain indels, which is why you are seeing this issue with that file.
Thanks!
Hi, Thanks for your quick reply. And I updated SigProfilerClusters to version 1.0.9 as you suggested and run the code with an vcf contained indels. This time it seems work. However when I check the result file "output/clustered/ID/test.vcf", I can not find the cluster info.
#CHROM POS ID REF ALT QUAL FILTER INFO
X 15429073 test T CTC . . ClusteredAnalysis
X 15429355 test G ATC . . ClusteredAnalysis
X 16603461 test C GTC . . ClusteredAnalysis
X 16603493 test C ATC . . ClusteredAnalysis
X 16603885 test C GTC . . ClusteredAnalysis
X 16603954 test C GTC . . ClusteredAnalysis
X 16604434 test C TTC . . ClusteredAnalysis
X 16604571 test C TTC . . ClusteredAnalysis
X 16606097 test C GTC . . ClusteredAnalysis
How so?
And, when I set exome=True, there is not notification for no clustered mutations. It will generate error message in SigProfilerClusters_cancer_GRCh37_2022-04-27.err file as below:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/hotspot.py", line 706, in calculateSampleIMDs
regions = densityCorrection(densityMuts, densityMutsSim, windowSize)
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/hotspot.py", line 543, in densityCorrection
sims = random.sample(list(densityMutsSim.keys()), 10)
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/random.py", line 321, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "../cluster.ID.py", line 9, in <module>
hp.analysis(sys.argv[1], "GRCh37", "ID", ["ID"], sys.argv[2], analysis="all", interdistance='ID',sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False, exome=True)
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/SigProfilerClusters.py", line 669, in analysis
regions, imds = hotspot.hotSpotAnalysis(project, genome, contexts, simContext, ref_dir, windowSize, processors, plotIMDfigure, exome, chromLengths, binsDensity, original, signature, percentage, firstRun, clustering_vaf, calculateIMD, chrom_based, correction)
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/site-packages/SigProfilerClusters/hotspot.py", line 1058, in hotSpotAnalysis
r.get()
File "/share/home/shuo.shi/anaconda3/envs/PY3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
ValueError: Sample larger than population or is negative
Hi,
I have run your example file; however, given the small number of events, there are no statistically significant clustered events present (the lower number of clustering could still happen by chance, therefore, they are not classified as confident clustered events). Thus, you will not expect to see any clustered info within the clustered/ID/test.vcf file. Additionally, if you expect clustered mutations, you also need to have subClassify=True to create these outputs.
The error that you are experiencing is also attributed to having too few mutations for this analysis. To help in running and understanding the indel analysis, I have now uploaded a new breast cancer sample that has only indels to the OSF wiki page. I recommend running this sample to see the expected output.
Please try the example file, and let us know if this still gives you issues.
Thanks.
Hi, I used the BRCA-EU_SP117344.indel.vcf file and the code as below, in which subClassify=True. However the result file "example/output/clustered/ID/BRCA-EU_SP117344.vcf" still not contained cluster info (groupNumber info). Ps. "ID/BRCA-EU_SP117344.vcf" is the only one file under the "example/output/clustered/".
import sys
from SigProfilerMatrixGenerator import install as genInstall
from SigProfilerSimulator import SigProfilerSimulator as sigSim
from SigProfilerClusters import SigProfilerClusters as hp
if __name__ == '__main__':
sigSim.SigProfilerSimulator("cancer", "./example/", "GRCh37", contexts = ['ID'], simulations=100)
hp.analysis("cancer", "GRCh37", "ID", ["ID"], "./example/", analysis="all", interdistance='ID',sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False)
head 10 lines of "example/output/clustered/ID/BRCA-EU_SP117344.vcf" file:
#CHROM POS ID REF ALT QUAL FILTER INFO
X 13755496 BRCA-EU_SP117344 GTT G . . ClusteredAnalysis
X 13756920 BRCA-EU_SP117344 ACTGATATAAAAATATGTCTACTTAGTAACCTATTTTTTCTTAAGGTATTTACTATGCAGGATCTATTACAACTCATTAAAATCAACCCTACTTCCAGT A . . ClusteredAnalysis
X 48399674 BRCA-EU_SP117344 ATGGTGATGGGGCCCC A . . ClusteredAnalysis
X 48400142 BRCA-EU_SP117344 GACTTAGGATTTTTCA G . . ClusteredAnalysis
1 193426800 BRCA-EU_SP117344 AT A . . ClusteredAnalysis
1 193427119 BRCA-EU_SP117344 GGAAAGACTACT G . . ClusteredAnalysis
2 17656445 BRCA-EU_SP117344 TCATATTATTTAAAAACATAAC T . . ClusteredAnalysis
2 17657570 BRCA-EU_SP117344 TATGG T . . ClusteredAnalysis
2 179799847 BRCA-EU_SP117344 CTT C . . ClusteredAnalysis
Hi,
This is the expected output as we do not provide group numbers for clustered indels. I will close this issue now that you appear to be getting the expected output. Please feel free to reopen, or open a new issue if you experience additional issues.
Thank you! -Erik
Hi, when I used SigProfilerClusters to generate clustered indel result using the code as below
I got error message in "SigProfilerClusters_cancer_GRCh37_2022-04-25.err" file:
Traceback (most recent call last): File "/share/home/shuo.shi/APOBEC/SigProfiler//script/cluster.ID.py", line 7, in
hp.analysis(sys.argv[1], "GRCh37", "ID", ["ID"], sys.argv[2], analysis="all", interdistance='ID', sortSims=True, subClassify=True, correction=True, calculateIMD=True, max_cpu=1, TCGA=True, sanger=False,exome=True)
File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerClusters/SigProfilerClusters.py", line 590, in analysis
regions, imds = hotspot.hotSpotAnalysis(project, genome, contexts, simContext, ref_dir, windowSize, processors, plotIMDfigure, exome, chromLengths, binsDensity, original, signature, percentage, firstRun, clustering_vaf, calculateIMD, chrom_based, correction)
File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerClusters/hotspot.py", line 1151, in hotSpotAnalysis
with open(ref_dir + "output/vcf_files" + path_suffix + "/" + project + "_clustered/INDEL/output/ID/" + project + "_clustered.ID83.all" ) as f:
FileNotFoundError: [Errno 2] No such file or directory: './Allen_Pat110_ID/output/vcf_files_corrected/cancer_clustered/INDEL/output/ID/cancer_clustered.ID83.all'
And I also tried with the BRCA_example.vcf (https://osf.io/qpmzw/wiki/5.%20Quick%20Start%20Example/) with the same parameter setting like below
However, this time I got the error message in SigProfilerSimulator_BRCA_GRCh37_2022-04-26.err file as below:
Traceback (most recent call last): File "/share/home/shuo.shi/APOBEC/SigProfiler/cluster.ID.example.py", line 6, in
sigSim.SigProfilerSimulator("BRCA", "./example/", "GRCh37", contexts = ["ID"], simulations=100)
File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerSimulator/SigProfilerSimulator.py", line 395, in SigProfilerSimulator
sample_names, mut_prep = simScript.mutation_preparation(catalogue_files, log_file)
File "/share/home/shuo.shi/anaconda3/lib/python3.9/site-packages/SigProfilerSimulator/mutational_simulator.py", line 498, in mutation_preparation
with open (catalogue_files[context]) as f:
FileNotFoundError: [Errno 2] No such file or directory: './example/output/ID/BRCA.ID83.all'
Please let me know how to fix it ?