AlexandrovLab / SigProfilerMatrixGenerator

SigProfilerMatrixGenerator creates mutational matrices for all types of somatic mutations. It allows downsizing the generated mutations only to parts for the genome (e.g., exome or a custom BED file). The tool seamlessly integrates with other SigProfiler tools.
BSD 2-Clause "Simplified" License
101 stars 37 forks source link

IndexError: index out of range in MutationMatrixGenerator.py #196

Closed Angel030331 closed 2 months ago

Angel030331 commented 2 months ago

Analyze.cosmic_fit(samples='/autofs/bal34/okwong/cancer_stat_proj/SigProfiler/lab_data', output="/autofs/bal34/okwong/cancer_stat_proj/SigProfiler/lab_data/lab_data_test_GRCh37_10092024", input_type="vcf", context_type="96", genome_build="GRCh38", cosmic_version=3.4, make_plots=True, verbose=True)

The code above results in the error below:

File /autofs/bal36/zxzheng/env/conda/envs/mamba/envs/somatic/lib/python3.9/site-packages/SigProfilerMatrixGenerator/scripts/MutationMatrixGenerator.py:451, in catalogue_generator_single(lines, chrom, mutation_dict, mutation_dinuc_pd_all, mutation_types_tsb_context, vcf_path, vcf_path_original, vcf_files, bed_file_path, chrom_path, project, output_matrix, context, exome, genome, ncbi_chrom, functionFlag, bed, bed_ranges, chrom_based, plot, tsb_ref, transcript_path, tsb_stat, seqInfo, gs, log_file, volume) 448 mut_seq += previous_mut 450 for l in range(start1 + 1, start2, 1): --> 451 mnv_seq += tsb_ref[chrom_string[l - 1]][1] 452 mut_seq += tsb_ref[chrom_string[l - 1]][1] 454 if i < len(mnv_index) - 1:

IndexError: index out of range

There is a 'input' directory in the '/autofs/bal34/okwong/cancer_stat_proj/SigProfiler/lab_data' directory that contains the vcfs

mdbarnesUCSD commented 2 months ago

Hi,

Could you please let us know what version of SigProfiler packages you have installed? I suspect there is an issue with your VCF input files. Could you please share a snippet of the input that reproduces your error? You can e-mail it to me, my contact info is on the README.

Thanks!

Angel030331 commented 2 months ago

-------Python and Package Versions------- Python Version: 3.9.0 SigProfilerMatrixGenerator Version: 1.2.28 SigProfilerPlotting version: 1.3.24 matplotlib version: 3.5.3 statsmodels version: 0.14.2 scipy version: 1.13.1 pandas version: 1.4.4 numpy version: 1.23.2

Angel030331 commented 2 months ago

I have emailed you the code snippets, thanks

Angel030331 commented 2 months ago

Please see the following information

-------Python and Package Versions------- Python Version: 3.9.0 SigProfilerMatrixGenerator Version: 1.2.28 SigProfilerPlotting version: 1.3.24 matplotlib version: 3.5.3 statsmodels version: 0.14.2 scipy version: 1.13.1 pandas version: 1.4.4 numpy version: 1.23.2 [image: Screenshot 2024-09-11 at 00.47.06.png][image: Screenshot 2024-09-11 at 00.47.13.png]

Yours sincerely, Angel Wong On Ki, Faculty of Science, The University of Hong Kong

mdbarnesUCSD @.***> 於 2024年9月11日 週三 上午12:46寫道:

Hi,

Could you please let us know what version of SigProfiler packages you have installed? I suspect there is an issue with your VCF input files. Could you please share a snippet of the input that reproduces your error? You can e-mail it to me, my contact info is on the README.

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/AlexandrovLab/SigProfilerMatrixGenerator/issues/196#issuecomment-2341463752, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYMSSTZHN5HYDMQFQRLQ4CDZV4O5FAVCNFSM6AAAAABN7FFEBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBRGQ3DGNZVGI . You are receiving this because you authored the thread.Message ID: @.*** com>

mdbarnesUCSD commented 2 months ago

Hi @Angel030331,

I contacted you for more information about the input VCF that you are running with. I have not heard back yet, but am going to share an example VCF file that you can find on our wiki page. Please see the file on the wiki page called example.vcf.

Angel030331 commented 2 months ago

To whom it may concern,

Sorry for the late reply as I missed the email. Please see the attached files. These are the vcf files in the input directory. Please also see the error message as attached. I apologize again and thank you so much for your time and efforts. HG008_T_N-P_HiFi_GRCh38-GIABv3_DeepVariant_snv.vcf https://drive.google.com/file/d/1X3EHmYovW4P79klfe2zLxc0XIcHnIw4z/view?usp=drive_web HG008_T_N-P_Ilmn_GRCh38-GIABv3_ClairS_snv.vcf https://drive.google.com/file/d/1Og6KvaPTbwePyiTEFKpr5auzAAMwkNOf/view?usp=drive_web HG008_T_N-P_Ilmn_GRCh38-GIABv3_Dragen_snv.vcf https://drive.google.com/file/d/1M6gDbARYJaHwAar6sKha6uWRujLR0aFd/view?usp=drive_web HG008_T_N-P_ONT_GRCh37_ClairS_snv.vcf https://drive.google.com/file/d/1RAgufBcEyN3pZK2bHrQUXAslbF6Zrt5g/view?usp=drive_web [image: image.png] [image: image.png] Yours sincerely, Angel Wong On Ki, Faculty of Science, The University of Hong Kong

mdbarnesUCSD @.***> 於 2024年9月14日 週六 上午4:48寫道:

Hi @Angel030331 https://github.com/Angel030331,

I contacted you for more information about the input VCF that you are running with. I have not heard back yet, but am going to share an example VCF file that you can find on our wiki page. Please see the file on the wiki page https://osf.io/s93d5/wiki/3.%20Using%20the%20Tool%20-%20SBS,%20ID,%20DBS%20Input/ called example.vcf.

— Reply to this email directly, view it on GitHub https://github.com/AlexandrovLab/SigProfilerMatrixGenerator/issues/196#issuecomment-2350137302, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYMSST2S2XKW3KHKLTL5FJDZWNFTFAVCNFSM6AAAAABN7FFEBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJQGEZTOMZQGI . You are receiving this because you were mentioned.Message ID: @.*** com>

mdbarnesUCSD commented 2 months ago

Hi @Angel030331,

I tested creating matrices with your input files and these seem to be the source of the issue.

The Dragen file contains non-canonical chromosomes (ie chr_Un...), you will want to filter these out prior to your run. Additionally, the HG008_T_N-P_ONT_GRCh37_ClairS_snv.vcf input file is GRCh37 and not GRCh38 like the rest of your files in this analysis. There were no issues with matrix generation for HG008_T_N-P_Ilmn_GRCh38-GIABv3_ClairS_snv.vcf.

Angel030331 commented 2 months ago

Thanks @mdbarnesUCSD Thank you so much for the feedback. I will check them out.

mdbarnesUCSD commented 2 months ago

Please reach out if you have any additional issues.