asoltis / MutEnricher

Somatic coding and non-coding mutation enrichment analysis for tumor WGS data
Other
9 stars 3 forks source link

Issue with Output Results Using Custom GTF Files in MutEnricher #8

Open fiou98 opened 6 days ago

fiou98 commented 6 days ago

Hello, I am currently using the MutEnricher program and have encountered an issue with the output results when utilizing my custom GTF files. I created my own GTF files based on both hg19 and hg38 genome references, and the analysis runs successfully without any errors. However, I noticed that the output generated from my custom GTF files contains no genes at all and that the files themselves are empty. The .log file indicates what follows: "Loaded 111309 genes from input GTF file. 0 total non-silent somatic mutations identified in 0 genes. 1294774 total somatic mutations identified in gene limits. Identified 0 candidate hotspots in 0 genes for testing. 0 gene enrichment results reported. 0 gene hotspot enrichment results reported." In contrast, the output from the GTF file provided by the MutEnricher developers includes gene data. I would like to understand why there is a discrepancy in the output results. Are there specific annotations, filters, or parameters in the provided GTF file that I might be missing in my custom versions? Any guidance on how to ensure that genes are included in the results would be greatly appreciated.

Thank you for your help!

Best regards, André Fiou

asoltis commented 6 days ago

Hi Andre,

Can you answer/check a few things:

Anthony

fiou98 commented 6 days ago

Sure!

Let me know if you have other questions Thanks

André

asoltis commented 6 days ago

At first glance, the GTFs look fine, but I'm not sure why you would run mixed genome alignment tests (the test VCFs are aligned to hg19, so there is no reason to run the hg38 GTF with these). Regardless, in your test output it looks like the coding analysis reported mutations within the gene limits, but found no non-silent mutations - this would suggest an issue of variant annotation and/or their definitions provided to the code. What software did you use to perform variant annotation, and can you provide the command line run you are using?

Also, I'm not fully clear on this, but are you saying your custom hg19 GTF did not work with the provided sample VCFs?

fiou98 commented 2 days ago

I performed a mixed alignment as a test, conscious of the fact that this couldn't work. My custom hg19 GTF file didn’t work with your test VCFs, which suggests that the issue may be related with the GTF file itself, as reflected in the different output. The command line I am using is the following one: python ~/MutEnricher/mutEnricher.py coding ~/MutEnricher/example_data/annotation_files/hg19_noYM ~/MutEnricher/example_data/vcf_files.txt --anno-type nonsilent_terms.txt -o ~/MutEnricher/output_files/my_GTF_test --prefix test_global

asoltis commented 2 days ago

Andre,

Can you share your hg19 GTF with me so I can have a look at it on my end? Also - what format of input data are you using for your own data? If VCF, what annotator have you used for identifying non-silent mutations in these?

Anthony

asoltis commented 10 hours ago

Andre,

Thinking about this further, I actually think the issue is mismatch between the gene names (gene ids) in your GTF files and what is annotated in the gene fields in the test VCFs. MutEnricher's coding analysis not only checks the genomic coordinates of mutations, but also the gene id/name when looking for nonsilent variants. Thus, I'd recommend annotating your VCFs according to the GTF file(s) you plan to use in your analysis.

Anthony