AuReMe / metage2metabo

From annotated genomes to metabolic screening in large scale microbiotas
https://metage2metabo.readthedocs.io
GNU Lesser General Public License v3.0
53 stars 7 forks source link

Error during recon: XML not well formed #17

Closed KDeaton closed 3 years ago

KDeaton commented 3 years ago

With certain .gbff files, I get the following error during the recon step: Fatal error: XML not well-formed - encountered token at illegal syntax position: 'START-TAG' following: '(:COMMENT "[if gt IE 8]>

Typically, it will be all genomes in a particular metagenomic assembly. Any advice for dealing with this? Is it the quality of annotation? Do you have recommendations for being able to screen for which annotated assemblies will run without issue?

Thanks!

ArnaudBelcour commented 3 years ago

Hi @KDeaton,

This issue comes from Pathway Tools during the run of m2m recon. It happens when Pathway Tools tries to query the NCBI for an article reference. But sometimes there is an issue with the query and Pathway Tools returns this error. Pathway Tools returns also this error when no internet connection is available.

So it is not necessarily coming from your data. Especially when using mpwt or m2m recon this can happens because the multiple processes of Pathway Tools query the NCBI at the same time and sometimes the NCBI can block some queries.

One way to try to solve this issue is to relaunch m2m recon with the same input and output. The tool will look for the successful and failed runs and relaunch Pathway Tools PathoLogic on the failed runs. That new run will also have less queries to NCBI than the previous (because some metabolic networks already passed), so the error is less likely to happen.

The information about which builds have passed and which have failed could be in the .log inside the output folder. But sometimes it can not be created. To see the information you can use the command mpwt --list to see all the PGDB of Pathway Tools meaning all the assemblies that have a metabolic network that was successfully reconstructed by Pathway Tools. Also in your input folder you can see for each sub-folder a pathologic.log file. In this file the successful runs are associated with a PGDB build done..

Sorry for the delay to answer.

KDeaton commented 3 years ago

Hi Arnaud, Thanks for the tip! Indeed, I first tried on a test subset of genomes that had previously gone through the whole workflow with no issues and now it wasn't working for these either. Recon has a "no-patch-download" flag, so I ran pathway-tools directly and it downloaded a patch and it is running fine now.

The delay was no problem, you're always super helpful and responsive. Thank you!