RemiAllio / MitoFinder

MitoFinder: efficient automated large-scale extraction of mitogenomic data from high throughput sequencing data
86 stars 14 forks source link

cElementTree.ParseError #39

Closed sihellem closed 1 year ago

sihellem commented 1 year ago

Hi Rémi,

I have been annotating a lot of samples recently with MitoFinder v1.4, but one consistently fails for no obvious reasons.

Here is what the log says:

Command line: /apps/unit/BourguignonU/MitoFinder/1.4/mitofinder --seqid Glossotermes_M2_ID283_S26_L002 --assembly M2_ID283_S26_L002_spades_scaffolds.fasta --tRNA-annotation mitfi --processors 20 --organism 5 --refseq sequence.gb

Start time : 2022-10-13 13:08:37
Job name = M2_ID283_S26_L002

Creating Output directory : M2_ID283_S26_L002
All results will be written here
Program folders:
MEGAHIT = /apps/unit/BourguignonU/MitoFinder/1.4/megahit/
Blast folder = /apps/unit/BourguignonU/MitoFinder/1.4/blast/bin/
IDBA-UD folder = /apps/unit/BourguignonU/MitoFinder/1.4/idba/bin/
MetaSPAdes folder = /apps/unit/BourguignonU/MitoFinder/1.4/metaspades/bin/
ARWEN folder = /apps/unit/BourguignonU/MitoFinder/1.4/arwen/
MiTFi folder = /apps/unit/BourguignonU/MitoFinder/1.4/mitfi/
tRNAscan-SE folder = /apps/unit/BourguignonU/MitoFinder/1.4/trnascanSE/tRNAscan-SE-2.0/

Formatting database for mitochondrial contigs identification...
Running mitochondrial contigs identification step...

MitoFinder found 176 contigs matching provided mitochondrial reference(s)
Did not check for circularization

.........
.........
.........

Creating summary statistics for mtDNA contig 65
Looking for best reference genes for mtDNA contig 65
Annotating mtDNA contig 65
tRNA annotation with MitFi run well.
Annotation completed

Creating summary statistics for mtDNA contig 66
Looking for best reference genes for mtDNA contig 66
Annotating mtDNA contig 66
tRNA annotation with MitFi run well.
ERROR: Gene annotation failed for mtDNA contig 66.
Please check  M2_ID283_S26_L002/geneChecker_error.log to see what happened
Aborting

And here is what is found in M2_ID283_S26_L002/geneChecker_error.log:

Traceback (most recent call last):
  File "/apps/unit/BourguignonU/MitoFinder/1.4/geneChecker_fasta.py", line 445, in <module>
    x = geneCheck(fastaReference, resultFile, percent_equality_prot, percent_equality_nucl, True, blastFolder, organismType, alignCutOff)
  File "/apps/unit/BourguignonU/MitoFinder/1.4/geneChecker_fasta.py", line 263, in geneCheck
    for qresult in blastparse: #in each query, let's look for a good hit
  File "/hpcshare/appsunit/BourguignonU/MitoFinder/1.4/Bio/SearchIO/__init__.py", line 314, in parse
    generator = iterator(source_file, **kwargs)
  File "/hpcshare/appsunit/BourguignonU/MitoFinder/1.4/Bio/SearchIO/BlastIO/blast_xml.py", line 190, in __init__
    self._meta, self._fallback = self._parse_preamble()
  File "/hpcshare/appsunit/BourguignonU/MitoFinder/1.4/Bio/SearchIO/BlastIO/blast_xml.py", line 204, in _parse_preamble
    for event, elem in self.xml_iter:
  File "<string>", line 107, in next
cElementTree.ParseError: no element found: line 1, column 0

I checked the contig #66 but could not find any issue with it.

Any idea what is happening here?

Thanks in advance for your reply.

Cheers, Simon

RemiAllio commented 1 year ago

Hi Simon,

Thank you for your message. It’s difficult for me to understand this issue without the data.

However, since you have a lot (!) of contigs matching the reference, and knowing that they are ordered by the level of similarity with the provided reference(s). I am wondering if you need all contigs to be annotated? If not, you may want to use the option --max-contig to annotate only the first contigs and avoid the error. This is the easiest way to avoid the error but if you really want to annotate all the contigs, we can think about another solution.

Tell me what you think, Best, Rémi

sihellem commented 1 year ago

Dear Rémi,

Thanks for your reply.

I was just curious on why this happened.

However, as I indeed do not need all contigs to be annotated, so I will settle with annotating fewer of them.

Thanks again! Simon

RemiAllio commented 1 year ago

Dear Simon,

I will let you know if I find anything about this strange issue!

Did using the --max-contig option work?

Best, Rémi

sihellem commented 1 year ago

Dear Rémi,

I did not use the option, as I could actually retrieve what I needed from the failed run.

However, I can send you the problematic file by e-mail, if you want to investigate the issue more closely? Is your @umontpellier.fr still current?

Cheers, Simon

RemiAllio commented 1 year ago

Dear Simon,

Thank you for your reply.

Here is my current e-mail address: remi.allio@inrae.fr I will let you know if I find anything!

Best, Rémi