Closed Adifo closed 2 years ago
Hi Adifo,
Errors with Pathway Tools occur quite often, we do our best to prevent them or help troubleshooting them but it is not always easy. It could also be an error from our side through the mpwt dependency.
Could you please share with us some logs that you might have? Cheers
Hi Clémence,
I've overwritten my logs leading to the File Error 17 but I will try to recreate the circumstances. Now that I've figured out much of the issues regarding prokka output and the new gbff format, the types of error I had were lower and I manage to obtain all my sbml in two subsequent runs.
First, here is the log of the first run leading to API limitations error. Input dir gbk contains 404 directories with corresponding .gbk/.gbff file
Command:
m2m workflow -g gbk -s seeds_workflow.sbml -o reconTestOutput -c 10 --clean
m2m_workflow.log: start_workflow.log
Directories count:
echo -ne "Input\t"; find ./gbk/ -name 'pathologic.log' | wc -l; echo -ne "Local\t"; find ~/ptools-local/pgdbs/ -name 'gc*' -type d | wc -l; echo -ne "Output\t"; find reconTestOutput/pgdb/ -name 'GC*' -type d | wc -l
Input 404
Local 404
Output 397
Second, rerun to complete the failing ones
Command:
m2m workflow -g gbk -s seeds_workflow.sbml -o reconTestOutput -c 10
m2m_workflow.log: second.log
Directories count:
echo -ne "Input\t"; find ./gbk/ -name 'pathologic.log' | wc -l; echo -ne "Local\t"; find ~/ptools-local/pgdbs/ -name 'gc*' -type d | wc -l; echo -ne "Output\t"; find reconTestOutput/pgdb/ -name 'GC*' -type d | wc -l
Input 379
Local 404
Output 404
In this case, I managed to finish the workflow so I'm fine with the result. Having fixed the content of the gbk before hand probably helped (like Locus line format error, missing db_xref entries or erroneous taxon_id). But, I've lost some PathoLogic logs in my input dir and mpwt still displayed several errors trying to rerun on genome that were tagged as already present.
From what I remember of m2m code (reconstruction.py), the full list of input genome is compared to PGDB content. The mpwt process is skipped only if the two sets match fully. If it is incomplete, the input dir goes to mpwt without removing the ones already processed. Maybe adding a skip list to the multiprocess_pwt
function could help.
Hi @Adifo,
As you already find a solution, I will just comment on what was the error.
During the inference by Pathway Tools, there is a step that will load NCBI citation. And with the multiprocessing sometimes there is too much queries send by Pathway Tools to the NCBI server(as we have X Pathway Tools processes that can query the NCBI, X being the number of CPU given with the option -c
).
And it can lead to the following error:
Fatal error: XML not well-formed - unrecognized content '{"error":"API rate limit exceeded","api-'
There is an option to avoid loading the NCBI citation (in the ptools-init.dat
file located inside the ptools-local
folder) named ###Batch-PathoLogic-Download-Pubmed-Entries?
but I never got it to work.
For your second issue, indeed Metage2Metabo will only see if the output folder contains all the PGDB of the associated organisms. For all other cases, it is handled by mpwt. But as I have modified the behavior of mpwt with the version 0.7.0, it is possible that there is some issues.
I will try to investigate this when I have some times. Just to be sure, can you send me your version of mpwt to check if it is superior or equal to 0.7.0?
Thanks for the insight on the API error. I will keep that in mind when defining the number of threads next time.
Regarding mpwt, I've used mpwt version 0.7.1.
Here is the full list of software from my environment and from the subsequent pip.
From my perspective, you can close this issue but feel free to leave it open and reach out to me for feedback.
micromamba list
List of packages in environment: "/home/adf/micromamba/envs/m2m"
Name Version Build Channel
────────────────────────────────────────────────────────────────
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libffi 3.4.2 h7f98852_5 conda-forge
libgcc-ng 11.2.0 h1d223b6_14 conda-forge
libgomp 11.2.0 h1d223b6_14 conda-forge
libnsl 2.0.0 h7f98852_0 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_14 conda-forge
libzlib 1.2.11 h166bdaf_1014 conda-forge
ncurses 6.3 h9c3ff4c_0 conda-forge
openssl 1.1.1n h166bdaf_0 conda-forge
pip 21.3.1 pyhd8ed1ab_0 conda-forge
python 3.6.15 hb7a2778_0_cpython conda-forge
python_abi 3.6 2_cp36m conda-forge
readline 8.1 h46c0cb4_0 conda-forge
setuptools 58.0.4 py36h5fab9bb_2 conda-forge
sqlite 3.37.1 h4ff8645_0 conda-forge
tk 8.6.12 h27826a3_0 conda-forge
wheel 0.37.1 pyhd8ed1ab_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
zlib 1.2.11 h166bdaf_1014 conda-forge
pip list
Package Version
------------------- ---------
anyio 3.5.0
appdirs 1.4.4
argcomplete 2.0.0
argh 0.26.2
Arpeggio 2.0.0
async-generator 1.10
attrs 21.4.0
biopython 1.79
bubbletools 0.6.11
certifi 2021.10.8
cffi 1.15.0
chardet 4.0.0
charset-normalizer 2.0.12
clingo 5.5.1
clyngor 0.4.2
clyngor-with-clingo 5.3.post1
cobra 0.24.0
commonmark 0.9.1
contextvars 2.4
dataclasses 0.8
decorator 4.4.2
depinfo 1.7.0
diskcache 5.4.0
docopt 0.6.2
ete3 3.1.1
future 0.18.2
gffutils 0.10.1
graphviz 0.19.1
h11 0.12.0
httpcore 0.14.7
httpx 0.22.0
idna 3.3
immutables 0.17
importlib-metadata 4.8.3
importlib-resources 5.4.0
iniconfig 1.1.1
lxml 4.8.0
MeneTools 3.2.1
Metage2Metabo 1.5.0
Miscoto 3.1.2
mpmath 1.2.1
mpwt 0.7.1
networkx 2.5.1
numpy 1.19.5
optlang 1.5.2
packaging 21.3
padmet 5.0.1
pandas 1.1.5
phasme 0.0.16
pip 21.3.1
pluggy 1.0.0
powergrasp 0.8.18
py 1.11.0
pycparser 2.21
pydantic 1.9.0
pydot 1.4.2
pyfaidx 0.6.4
Pygments 2.11.2
pyparsing 3.0.7
pyPEG2 2.15.2
pytest 7.0.1
python-dateutil 2.8.2
python-libsbml 5.19.2
pytz 2022.1
rfc3986 1.5.0
rich 12.2.0
ruamel.yaml 0.17.21
ruamel.yaml.clib 0.2.6
setuptools 58.0.4
simplejson 3.17.6
six 1.16.0
sniffio 1.2.0
swiglpk 5.0.5
sympy 1.9
tomli 1.2.3
typing_extensions 4.1.1
wheel 0.37.1
zipp 3.6.0
Hello,
I am trying to use M2M on a set of a few hundreds of bacterial genome. I am using m2m 1.5.0 with pathway tools 25.5 in a conda environment with python 3.6.
My issue is linked to the recon step and populating the pgdb local instance with my own genome. During the process, I start getting error message about having reached an API limit, usually after 256 genome insertions. Rerunning the same command allows me to go forward but I then have error message about with the pathologic files (already present in PGDB) or erroneous flat files from mpwt. If I restart with the same outdir and the --clean option, the accesory .log and .lisp files from my input directory are removed as well as the content of my pgdb local instance (ptools-local). It keeps all the entries inside my outdir but then fails on a File Error 17 because the genome is already present.
Basically, I think there is something weird going on during the verification step of doing either the flat files creation or the insertion inside PGDB. It is difficult for me to figure it out as the log states that some genome will be skipped because already present but still getting an error message about it in the end, probably because you need to handle 3 sets of genome entries (input dir, output dir, local PGDB)
Could you provide a guideline to how to run the workflow properly while avoiding issues linked to that pathway tools API limitation ?
Thank you