Closed kbrar4013 closed 5 months ago
Hi @kbrar4013,
Thanks for reaching out! Can you please share how you run both SigProfilerSimulator and SigProfilerClustrers tool at your end. Also please share the log files(.err and .out) for SigProfilerClusters as well.
Best, Mousumy
Hi,
Thanks for your response!
Here is the command for SigProfilerSimulator:
from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("MOCHA_0324_subs", "/home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/", "GRCh38", contexts=['96'], simulations=100, chrom_based=True, vcf=True)
and SigProfilerClusters:
from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("MOCHA_Mar2024_subs", "GRCh38", "96", ["96"], "home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/", analysis="all", sortSims=True, subClassify=True, includedVAFs=False, includedCCFs=False)
The error file for SigProfilerClusters is empty. The Log file is below: THIS FILE CONTAINS THE METADATA ABOUT SYSTEM AND RUNTIME
-------System Info------- Operating System Name: Linux Nodename: Tink5 Release: 4.15.0-213-generic Version: #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023
-------Python and Package Versions------- Python Version: 3.9.0 SigProfilerMatrixGenerator Version: 1.2.25 SigProfilerPlotting version: 1.3.20 matplotlib version: 3.4.3 scipy version: 1.12.0 numpy version: 1.26.4
-------Vital Parameters Used for the execution ------- Project: MOCHA_0324_subs Genome: GRCh38 Context: ['96'] interdistance: False input_path: /home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/ output_type: all
-------Date and Time Data------- Date and Clock time when the execution started: 2024-03-15 11:06:30.715182
-------Runtime Checkpoints-------
Thanks again!
Hi @kbrar4013,
Thanks for sharing!
Can I ask you to run the SigProfilerSimulator tool with vcf=False. For example:
from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("MOCHA_0324_subs", "/home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/", "GRCh38", contexts=['96'], simulations=100, chrom_based=True, vcf=False)
And then run the SigProfilerClusters tool as you have run. Please let me know if you run into any issues.
Best, Mousumy
Hey,
Thanks for your help! The tool does now run successfully, but I appear to have a different issue - the resulting output is not as expected. The folders "clustered" and "nonclustered" do not exist:
In addition, there is no rainfall plot in the "plots" folder. Further, in the "vcf_files_corrected" folder, the "clustered" subfolder appears to contain the "deprecated" files as opposed to the file expected as described here: https://osf.io/qpmzw/wiki/4.%20Output/
Looking at the SigProfilerClusters error file, here's the output:
Traceback (most recent call last):
File "
Any assistance would be greatly appreciated. Thank you so much for your prompt responses and help!!
Hi, I'll also just attach the log file for SigProfilerClusters: -------Python and Package Versions------- Python Version: 3.9.0 SigProfilerMatrixGenerator Version: 1.2.25 SigProfilerPlotting version: 1.3.20 matplotlib version: 3.4.3 scipy version: 1.12.0 numpy version: 1.26.4
-------Vital Parameters Used for the execution ------- Project: MOCHA_0324_subs Genome: GRCh38 Context: ['96'] interdistance: False input_path: /home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/ output_type: all
-------Date and Time Data------- Date and Clock time when the execution started: 2024-03-26 15:50:56.093666
-------Runtime Checkpoints------- Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.76 seconds. Matrices generated for 51 samples with 0 errors. Total of 719 SNVs, 10 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 2.82 seconds. Matrices generated for 47 samples with 0 errors. Total of 552 SNVs, 16 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 2.85 seconds. Matrices generated for 38 samples with 0 errors. Total of 1801 SNVs, 76 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 2.87 seconds. Matrices generated for 72 samples with 0 errors. Total of 1184 SNVs, 0 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 3.5 seconds. Matrices generated for 85 samples with 0 errors. Total of 6183 SNVs, 78 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 3.78 seconds. Matrices generated for 88 samples with 0 errors. Total of 9255 SNVs, 180 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 6.37 seconds. Matrices generated for 92 samples with 0 errors. Total of 24380 SNVs, 12190 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 6.61 seconds. Matrices generated for 92 samples with 0 errors. Total of 49833 SNVs, 411 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 10.23 seconds. Matrices generated for 92 samples with 0 errors. Total of 75397 SNVs, 12601 DINUCs, and 0 INDELs were successfully analyzed.
Hi @kbrar4013 ,
Thanks for sharing!
Can I ask you to share one of your example files so that I can run at my end? By this time, my suggestion will be to remove the previous logs and out files and then re-run the SigProfilerSimulator and SigProfilerClusters tool.
Best, Mousumy
Hi, sure I will email you the file! Thanks
Hi @kbrar4013 ,
I suggested a solution in your email but still haven't heard from you. Please reopen the issue if you have any issues..
Best, Mousumy
Hi,
I'm trying to run SigProfilerClusters on a set of VCF files. I've successfully run SigProfilerSimulator on these files with 100 simulations, and the log file is pasted below:
-------Python and Package Versions------- Python Version: 3.9.0 SigProfilerSimulator Version: 1.1.5 SigProfilerMatrixGenerator Version: 1.2.25 numpy version: 1.26.4
-------Vital Parameters Used for the execution ------- Project: MOCHA_0324_subs Genome: GRCh38 Input File Path: /home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/ contexts: ['96'] exome: None simulations: 100 updating: False bed_file: None overlap: False gender: female seqInfo: False chrom_based: True seed_file: None
-------Date and Time Data------- Date and Clock time when the execution started: 2024-03-13 13:12:47.389505
-------Seeds for random number generation per process------- Process 0: 0 Process 1: 1 Process 2: 1 Process 3: 0 Process 4: 1 Process 5: 2 Process 6: 0 Process 7: 0 Process 8: 1 Process 9: 0 Process 10: 1 Process 11: 1 Process 12: 0 Process 13: 0 Process 14: 1 Process 15: 0 Process 16: 2 Process 17: 0 Process 18: 2 Process 19: 1 Process 20: 0 Process 21: 1 Process 22: 0
-------Runtime Checkpoints------- Chromosome 22 done Chromosome 21 done Chromosome 17 done Chromosome 19 done Chromosome 20 done Chromosome 16 done Chromosome 15 done Chromosome 18 done Chromosome 14 done Chromosome 9 done Chromosome 12 done Chromosome X done Chromosome 10 done Chromosome 11 done Chromosome 13 done Chromosome 6 done Chromosome 7 done Chromosome 5 done Chromosome 3 done Chromosome 8 done Chromosome 4 done Chromosome 1 done Chromosome 2 done Simulation completed Job took 1650.7996301651 seconds
When I then try to run SigProfilerClusters with this same folder path, I get the following message and it exits:
Not sure where to go from here, as I've run 100 simulations on every sample, and have confirmed this as there are 100 files in each sample's folder in the "simulations" output folder. Any help would be appreciated, thanks! And thanks again for creating this very interesting tool.