AlexandrovLab / SigProfilerExtractor

SigProfilerExtractor allows de novo extraction of mutational signatures from data generated in a matrix format. The tool identifies the number of operative mutational signatures, their activities in each sample, and the probability for each signature to cause a specific mutation type in a cancer sample. The tool makes use of SigProfilerMatrixGenerator and SigProfilerPlotting.
BSD 2-Clause "Simplified" License
148 stars 50 forks source link

Error while running the tool #251

Closed itigupta2429 closed 18 hours ago

itigupta2429 commented 3 weeks ago

Hi Team,

I was using the tool Sigprofilerextractor and while running it for my cancer cohort I encountered the following error:

Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
  File "/home/iti.gupta1/mambaforge/lib/python3.10/site-packages/SigProfilerExtractor/sigpro.py", line 636, in sigProfilerExtractor
    data = datadump.SigProfilerMatrixGeneratorFunc(
  File "/home/iti.gupta1/mambaforge/lib/python3.10/site-packages/SigProfilerMatrixGenerator/scripts/SigProfilerMatrixGeneratorFunc.py", line 2690, in SigProfilerMatrixGeneratorFunc
    raise ValueError(
ValueError: Error: More than 30% of mutations were skipped. Please check the log file for more information.

The out file inside logs folder looks like this:


Genome: GRCh38
Input File Path: multisample_VCFs/PASS_calls/split_VCF/tumor_sample_VCF/SMP8_only/input/
exome: False
bed_file: None
chrom_based: False
plot: False
tsb_stat: False
seqInfo: True

-------Date and Time Data-------
Date and Clock time when the execution started: 2024-07-02 14:04:26.655209

-------Runtime Checkpoints-------
Chromosome 10 done
Chromosome 11 done
Chromosome 12 done
Chromosome 13 done
Chromosome 14 done
Chromosome 15 done
Chromosome 16 done
Chromosome 17 done
Chromosome 18 done
Chromosome 19 done
Chromosome 1 done
Chromosome 20 done
Chromosome 21 done
Chromosome 22 done
Chromosome 2 done
Chromosome 3 done
Chromosome 4 done
Chromosome 5 done
Chromosome 6 done
Chromosome 7 done
Chromosome 8 done
Chromosome 9 done
Chromosome MT done
Chromosome X done
Chromosome Y done
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA
There appears to be a duplicate single base substitution. Skipping this mutation: 1010378CC
CTTAACCTTAA 

I tried running for individual patient as well with 8 samples (8 separate vcfs), unfortunately I got the same error. Can you please help me how to deal with this error?

mdbarnesUCSD commented 2 weeks ago

Hi @itigupta2429,

The error that shows up during matrix generation that says "more than 30% of mutations were skipped" is usually an indicator that the wrong reference genome was used.

In regards to the duplicate single base substitution, it may be helpful to review this: https://github.com/AlexandrovLab/SigProfilerMatrixGenerator/issues/153