AlexandrovLab / SigProfilerClusters

Tool for analyzing the inter-mutational distances between SNV-SNV and INDEL-INDEL mutations. Tool separates mutations into clustered and non-clustered groups on a sample-dependent basis.
BSD 2-Clause "Simplified" License
11 stars 1 forks source link

Error with running SigProfilerClusters after SigProfilerSimulator #22

Closed kbrar4013 closed 5 months ago

kbrar4013 commented 7 months ago

Hi,

I'm trying to run SigProfilerClusters on a set of VCF files. I've successfully run SigProfilerSimulator on these files with 100 simulations, and the log file is pasted below:

-------Python and Package Versions------- Python Version: 3.9.0 SigProfilerSimulator Version: 1.1.5 SigProfilerMatrixGenerator Version: 1.2.25 numpy version: 1.26.4

-------Vital Parameters Used for the execution ------- Project: MOCHA_0324_subs Genome: GRCh38 Input File Path: /home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/ contexts: ['96'] exome: None simulations: 100 updating: False bed_file: None overlap: False gender: female seqInfo: False chrom_based: True seed_file: None

-------Date and Time Data------- Date and Clock time when the execution started: 2024-03-13 13:12:47.389505

-------Seeds for random number generation per process------- Process 0: 0 Process 1: 1 Process 2: 1 Process 3: 0 Process 4: 1 Process 5: 2 Process 6: 0 Process 7: 0 Process 8: 1 Process 9: 0 Process 10: 1 Process 11: 1 Process 12: 0 Process 13: 0 Process 14: 1 Process 15: 0 Process 16: 2 Process 17: 0 Process 18: 2 Process 19: 1 Process 20: 0 Process 21: 1 Process 22: 0

-------Runtime Checkpoints------- Chromosome 22 done Chromosome 21 done Chromosome 17 done Chromosome 19 done Chromosome 20 done Chromosome 16 done Chromosome 15 done Chromosome 18 done Chromosome 14 done Chromosome 9 done Chromosome 12 done Chromosome X done Chromosome 10 done Chromosome 11 done Chromosome 13 done Chromosome 6 done Chromosome 7 done Chromosome 5 done Chromosome 3 done Chromosome 8 done Chromosome 4 done Chromosome 1 done Chromosome 2 done Simulation completed Job took 1650.7996301651 seconds

When I then try to run SigProfilerClusters with this same folder path, I get the following message and it exits:

Screenshot 2024-03-14 at 2 55 44 PM

Not sure where to go from here, as I've run 100 simulations on every sample, and have confirmed this as there are 100 files in each sample's folder in the "simulations" output folder. Any help would be appreciated, thanks! And thanks again for creating this very interesting tool.

MousumyCSE commented 7 months ago

Hi @kbrar4013,

Thanks for reaching out! Can you please share how you run both SigProfilerSimulator and SigProfilerClustrers tool at your end. Also please share the log files(.err and .out) for SigProfilerClusters as well.

Best, Mousumy

kbrar4013 commented 7 months ago

Hi,

Thanks for your response!

Here is the command for SigProfilerSimulator:

from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("MOCHA_0324_subs", "/home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/", "GRCh38", contexts=['96'], simulations=100, chrom_based=True, vcf=True)

and SigProfilerClusters:

from SigProfilerClusters import SigProfilerClusters as hp hp.analysis("MOCHA_Mar2024_subs", "GRCh38", "96", ["96"], "home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/", analysis="all", sortSims=True, subClassify=True, includedVAFs=False, includedCCFs=False)

The error file for SigProfilerClusters is empty. The Log file is below: THIS FILE CONTAINS THE METADATA ABOUT SYSTEM AND RUNTIME

-------System Info------- Operating System Name: Linux Nodename: Tink5 Release: 4.15.0-213-generic Version: #224-Ubuntu SMP Mon Jun 19 13:30:12 UTC 2023

-------Python and Package Versions------- Python Version: 3.9.0 SigProfilerMatrixGenerator Version: 1.2.25 SigProfilerPlotting version: 1.3.20 matplotlib version: 3.4.3 scipy version: 1.12.0 numpy version: 1.26.4

-------Vital Parameters Used for the execution ------- Project: MOCHA_0324_subs Genome: GRCh38 Context: ['96'] interdistance: False input_path: /home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/ output_type: all

-------Date and Time Data------- Date and Clock time when the execution started: 2024-03-15 11:06:30.715182

-------Runtime Checkpoints-------

Thanks again!

MousumyCSE commented 7 months ago

Hi @kbrar4013,

Thanks for sharing!

Can I ask you to run the SigProfilerSimulator tool with vcf=False. For example:

from SigProfilerSimulator import SigProfilerSimulator as sigSim sigSim.SigProfilerSimulator("MOCHA_0324_subs", "/home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/", "GRCh38", contexts=['96'], simulations=100, chrom_based=True, vcf=False)

And then run the SigProfilerClusters tool as you have run. Please let me know if you run into any issues.

Best, Mousumy

kbrar4013 commented 7 months ago

Hey,

Thanks for your help! The tool does now run successfully, but I appear to have a different issue - the resulting output is not as expected. The folders "clustered" and "nonclustered" do not exist:

Screenshot 2024-03-27 at 1 33 39 PM

In addition, there is no rainfall plot in the "plots" folder. Further, in the "vcf_files_corrected" folder, the "clustered" subfolder appears to contain the "deprecated" files as opposed to the file expected as described here: https://osf.io/qpmzw/wiki/4.%20Output/

Screenshot 2024-03-27 at 1 34 47 PM

Looking at the SigProfilerClusters error file, here's the output: Traceback (most recent call last): File "", line 1, in File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/SigProfilerClusters/SigProfilerClusters.py", line 718, in analysis plottingFunctions.rainfall(chrom_based, project, input_path, chrom_path, chromLengths, centromeres, contexts, includedVAFs, includedCCFs, correction, windowSize, bedRanges) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/SigProfilerClusters/plottingFunctions.py", line 1109, in rainfall newMutations = pd.read_csv(mutationsPath[i], sep="\t", names=["project", "samples","ID","genome","mutType","chr","start","end", "ref", "alt", "mutClass", "IMDplot", "IMD"], header=0, skiprows=[0], engine='python') File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/util/_decorators.py", line 211, in wrapper return func(*args, *kwargs) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/util/_decorators.py", line 331, in wrapper return func(args, **kwargs) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv return _read(filepath_or_buffer, kwds) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 611, in _read return parser.read(nrows) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1778, in read ) = self._engine.read( # type: ignore[attr-defined] File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/io/parsers/python_parser.py", line 282, in read alldata = self._rows_to_cols(content) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/io/parsers/python_parser.py", line 1045, in _rows_to_cols self._alert_malformed(msg, row_num + 1) File "/home/kbrar/miniforge3/envs/sigprofilerclusters/lib/python3.9/site-packages/pandas/io/parsers/python_parser.py", line 765, in _alert_malformed raise ParserError(msg) pandas.errors.ParserError: Expected 13 fields in line 638201, saw 25

Any assistance would be greatly appreciated. Thank you so much for your prompt responses and help!!

kbrar4013 commented 7 months ago

Hi, I'll also just attach the log file for SigProfilerClusters: -------Python and Package Versions------- Python Version: 3.9.0 SigProfilerMatrixGenerator Version: 1.2.25 SigProfilerPlotting version: 1.3.20 matplotlib version: 3.4.3 scipy version: 1.12.0 numpy version: 1.26.4

-------Vital Parameters Used for the execution ------- Project: MOCHA_0324_subs Genome: GRCh38 Context: ['96'] interdistance: False input_path: /home/kbrar/MOCHA_Jan_30_2024/Somatic_SNV_vcf/unzipped_SNV_vcfs/info_SNV_vcfs/ output_type: all

-------Date and Time Data------- Date and Clock time when the execution started: 2024-03-26 15:50:56.093666

-------Runtime Checkpoints------- Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Starting matrix generation for SNVs and DINUCs...Completed! Elapsed time: 2.76 seconds. Matrices generated for 51 samples with 0 errors. Total of 719 SNVs, 10 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 2.82 seconds. Matrices generated for 47 samples with 0 errors. Total of 552 SNVs, 16 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 2.85 seconds. Matrices generated for 38 samples with 0 errors. Total of 1801 SNVs, 76 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 2.87 seconds. Matrices generated for 72 samples with 0 errors. Total of 1184 SNVs, 0 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 3.5 seconds. Matrices generated for 85 samples with 0 errors. Total of 6183 SNVs, 78 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 3.78 seconds. Matrices generated for 88 samples with 0 errors. Total of 9255 SNVs, 180 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 6.37 seconds. Matrices generated for 92 samples with 0 errors. Total of 24380 SNVs, 12190 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 6.61 seconds. Matrices generated for 92 samples with 0 errors. Total of 49833 SNVs, 411 DINUCs, and 0 INDELs were successfully analyzed. Completed! Elapsed time: 10.23 seconds. Matrices generated for 92 samples with 0 errors. Total of 75397 SNVs, 12601 DINUCs, and 0 INDELs were successfully analyzed.

MousumyCSE commented 7 months ago

Hi @kbrar4013 ,

Thanks for sharing!

Can I ask you to share one of your example files so that I can run at my end? By this time, my suggestion will be to remove the previous logs and out files and then re-run the SigProfilerSimulator and SigProfilerClusters tool.

Best, Mousumy

kbrar4013 commented 7 months ago

Hi, sure I will email you the file! Thanks

MousumyCSE commented 5 months ago

Hi @kbrar4013 ,

I suggested a solution in your email but still haven't heard from you. Please reopen the issue if you have any issues..

Best, Mousumy