cmayer / MitoGeneExtractor

The MitoGeneExtractor can be used to extract protein coding mitochondrial genes, such as COI and others from short and long read sequencing libraries.
GNU Affero General Public License v3.0
6 stars 3 forks source link

ERROR: Running exonerate failed. #8

Closed carla-hazelf closed 6 months ago

carla-hazelf commented 1 year ago

Hello,

Thank you so much for making this tool!

I am running MitoGene on Illumina PE data and trying to remove a list of protein .fasta. The programme runs for ~2 hours, utilizes 95% of the memory I give it (669GB of 700GB), with this code:

$mitogene -q $R1 -q $R2 -p $fasta_reference -o out-alignment.fas -n 0 -c out-consensus.fas -t 0.5 -r 1 -C 2

With dependencies

singularity/3.7.3 exonerate/2.2.0

And then crashes with this error:

WARNING: You did not specify a vulgar file, so a temporary vulgar file will be created for this run that will be removed at the end of this program run. Therefor the vulgar file cannot be reused in other runs.
Filename:./Concatenated_exonerate_input_XXXXXX
Filename:./Concatenated_exonerate_input_XXXXXX
Filename:./Concatenated_exonerate_input_jvB6Ri
sh: line 1: 3274460 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274503 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274606 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274710 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3274809 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275376 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275671 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275785 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275887 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
sh: line 1: 3275983 Segmentation fault      (core dumped) exonerate --geneticcode FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG --frameshift -9 --query /nfs/scratch/finnca/mitoproteins.fasta -Q protein --target ./Concatenated_exonerate_input_jvB6Ri -T dna --model protein2dna --showalignment 0 --showvulgar 1 > tmp-vulgar.txt 2> tmp-vulgar.txt.log
ERROR: Running exonerate failed. The generated vulgar file is incomplete and should be removed manually. Exiting.

Please let me know what further information I can provide to help resolve this issue (and I'm sorry if it's just that I've misunderstood something!)

marievalerie commented 1 year ago

Hi, I think this is a C++ error related to improper memory usage. I believe this could happen if exonerate needs more memory than you've specified. @cmayer what do you think?

Could you try to specify more memory and see whether it still consumes 95% and then crashes again? (Or, in the best case, it will finish....)

However, 700 GB RAM is already a lot and I would have expected that this is sufficient for Illumina data... What kind of data set are you analyzing, is it RNA-seq? How large are your input files? And how many protein sequences are in your reference fasta file?

Let's see what Christoph will say, maybe you need to send me some of your files and your reference and I will have a closer look.

Marie

cmayer commented 1 year ago

Many thanks for reporting this problem!

The error message says that the exonerate program crashed. This normally happens if exonerate runs our of memory. Are the two files you analyse particularly large? Can you indicate what sizes they have? Could you try to analyse them separately even though I understand that you want to get a result for the combined input file. If the two files can be analyses separately, the following could be done: The two vulgar files could be combined manually and the fastq files could be combined manully. With these files MitoGeneExtractor should run without any issues. If the two files can be analysed separately, I could also change the way multiple input files are handled. Now the input files are combined and exonerate is called once. Alternatively, the files could be passed to exonerate separately and the exonerate result files could be combined in MitoGeneExtractor.