EI-CoreBioinformatics / reat

Robust Eukaryotic Annotation Toolkit
https://reat.readthedocs.io/en/latest/
MIT License
17 stars 3 forks source link

Transcriptome workflow, mikado crashes at serialise step #45

Closed elisagold closed 1 year ago

elisagold commented 1 year ago

Hello,

I am trying to run the transcriptome workflow of reat, but I am unfortunately getting stuck at the serialise step. I already had a look at the mikado issues page, but couldn't find a solution for me there. I installed reat in a conda environment using mamba env create -f reat/reat.yml. In the reat.yml I specified python=3.8 and pip=21.3.1 and in setup.py I changed the version of mikado to the fix_install branch. Reat was installed using pip install ./reat --no-cache-dir.

The inputs for mikado serialise were all created, but no mikado.db was made.

This is the error that the serialise.log provides:

1a.hisat.scallop_1a_SCLP.18757.5.0  0   2365    ID=1a.hisat.scallop_1a_SCLP.18757.5.0;coding=False  4.1 -   2180    2332    0   1   183 2181
2023-10-24 12:07:41,200 - Bed12ParseWrapper-19 - bed12.py:1871 - WARNING - run - Bed12ParseWrapper-19 - Invalid entry, reason: Invalid CDS length: 152 % 3 = 2 (1734-1885, 0)
1a.hisat.scallop_1a_SCLP.16680.10.0 0   1894    ID=1a.hisat.scallop_1a_SCLP.16680.10.0;coding=False 3.2 -   1733    1885    0   1   159 1734
2023-10-24 12:07:41,976 - serialise - orf.py:448 - INFO - __serialize_multiple_threads - MainProcess - Finished loading 57955 ORFs into the database
2023-10-24 12:07:42,358 - serialise - serialise.py:187 - INFO - load_orfs - MainProcess - Finished loading ORF data
2023-10-24 12:07:42,370 - serialise - serialise.py:142 - INFO - load_blast - MainProcess - Starting to load BLAST data
2023-10-24 12:07:42,371 - serialise - blast_serialiser.py:82 - INFO - __init__ - MainProcess - Number of dedicated workers: 40
2023-10-24 12:07:44,024 - serialise - blast_serialiser.py:249 - INFO - __serialize_targets - MainProcess - Started to serialise the targets
2023-10-24 12:07:45,101 - serialise - blast_serialiser.py:283 - INFO - __serialize_targets - MainProcess - Loaded 377931 objects into the "target" table
2023-10-24 12:07:45,124 - serialise - blast_serialiser.py:174 - INFO - __serialize_queries - MainProcess - Started to serialise the queries
2023-10-24 12:07:45,148 - serialise - blast_serialiser.py:226 - INFO - __serialize_queries - MainProcess - Loaded 0 objects into the "query" table
2023-10-24 12:07:45,151 - serialise - tab_serialiser.py:31 - INFO - _serialise_tabular - MainProcess - Creating a pool with 40 workers for analysing BLAST results
2023-10-24 12:07:46,058 - serialise - tabular_utils.py:431 - INFO - parse_tab_blast - MainProcess - Reading /data/elisa/spirogyra_genome/annotation/transcriptome_workflow/cromwell-executions/ei_annotation/ee7f86bc-60c8-42a6-8c0c-23b7dce589be/call-wf_main_mikado/wf_main_mikado/330e702b-9b00-4783-b796-f3aaf179c544/call-Mikado_short_and_long/wf_mikado/eb03adb7-86bf-45d0-bb05-054137e14b77/call-MikadoSerialise/inputs/746107790/mikado_diamond_homology.tsv data
2023-10-24 12:07:48,322 - serialise - serialise.py:388 - ERROR - serialise - MainProcess - Mikado crashed due to an error. Please check the logs for hints on the cause of the error; if it is a bug, please report it to https://github.com/EI-CoreBioinformatics/mikado/issues.
2023-10-24 12:07:48,322 - serialise - serialise.py:390 - ERROR - serialise - MainProcess - Cannot use a compiled regex as replacement pattern with regex=False

Here is the list of programs in my conda environment REAT_conda_env.txt

Here is the general reat log-file log.out.txt

The workflow runs on the same machine using the same data with a version of reat (it's not mine and has some in-house patches so I cannot use the same) that was installed ~1 year ago, so the input files should be fine.

elisagold commented 1 year ago

I managed to fix my installation by downgrading pip install numpy==1.23.0 --no-cache-dir pip install sqlalchemy==1.4.38 --no-cache-dir pip install pandas==1.4.3 --no-cache-dir

working_reat_env.yml.txt