AlexandrovLab / SigProfilerMatrixGenerator

SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts for the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools.
BSD 2-Clause "Simplified" License
100 stars 37 forks source link

Difficult creation of matrix #150

Closed bioinfo-dirty-jobs closed 1 year ago

bioinfo-dirty-jobs commented 1 year ago

I have followed the tutorial and I try to import vcf and maf file. The error are always the same: Python 3.10.12


`1 matrices = matGen.SigProfilerMatrixGeneratorFunc("test", "GRCh37", "[/home/bioinfo/Desktop/test](https://file+.vscode-resource.vscode-cdn.net/home/bioinfo/Desktop/test)",plot=True, exome=False, bed_file=None, chrom_based=False, tsb_stat=False, seqInfo=False, cushion=0)

File [~/miniconda3/envs/spmg_r_1.2.13/lib/python3.10/site-packages/SigProfilerMatrixGenerator/scripts/SigProfilerMatrixGeneratorFunc.py:2700](https://file+.vscode-resource.vscode-cdn.net/home/bioinfo/data/Notebook/~/miniconda3/envs/spmg_r_1.2.13/lib/python3.10/site-packages/SigProfilerMatrixGenerator/scripts/SigProfilerMatrixGeneratorFunc.py:2700), in SigProfilerMatrixGeneratorFunc(project, reference_genome, path_to_input_files, exome, bed_file, chrom_based, plot, tsb_stat, seqInfo, cushion, gs)
   2695     # Raise an error when more than 30% of mutations are skipped
   2696     if (
   2697         skipped_muts
   2698         > (analyzed_muts[0] + analyzed_muts[1] + analyzed_muts[2]) * 0.3
   2699     ):
-> 2700         raise ValueError(
   2701             "Error: More than 30% of mutations were skipped. Please check the log file for more information."
   2702         )
   2703 return matrices

ValueError: Error: More than 30% of mutations were skipped. Please check the log file for more information.`

On the log I use

The mutation base is not recognized. Skipping this mutation: 3 187446908 T .
The mutation base is not recognized. Skipping this mutation: 3 187446909 T .
The mutation base is not recognized. Skipping this mutation: 3 187446910 G .
The mutation base is not recognized. Skipping this mutation: 3 187446911 G .
The mutation base is not recognized. Skipping this mutation: 3 187446912 G .
The mutation base is not recognized. Skipping this mutation: 3 187446913 G .
The mutation base is not recognized. Skipping this mutation: 3 187446914 A .
The mutation base is not recognized. Skipping this mutation: 3 187446915 C .
The mutation base is not recognized. Skipping this mutation: 3 187446916 T .
The mutation base is not recognized. Skipping this mutation: 3 187446917 G .
The mutation base is not recognized. Skipping this mutation: 3 187446918 G .
The mutation base is not recognized. Skipping this mutation: 3 187446919 A .
The mutation base is not recognized. Skipping this mutation: 3 187446920 G .
The mutation base is not recognized. Skipping this mutation: 3 187446921 G .
The mutation base is not recognized. Skipping this mutation: 3 187446922 T .
The mutation base is not recognized. Skipping this mutation: 3 187446923 C .
The mutation base is not recognized. Skipping this mutation: 3 187446924 A .
The mutation base is not recognized. Skipping this mutation: 3 187446925 A .
The mutation base is not recognized. Skipping this mutation: 3 187446926 G .
The mutation base is not recognized. Skipping this mutation: 3 187446927 G 

All the chromosome are without chr. I use a enriched gene panel aligned on hg19 to try this package... Could you please help me?

mdbarnesUCSD commented 1 year ago

Hi @bioinfo-dirty-jobs,

Please check the order of columns in your VCF file. The REF column is supposed to be followed by ALT.

mdbarnesUCSD commented 1 year ago

Please re-open this issue if it was not resolved.