EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
94 stars 18 forks source link

mikado serialize orfs error #309

Closed bbista closed 3 years ago

bbista commented 4 years ago

Hello,

I am getting an unusual error in the mikado serialize step of the pipeline. The following is the serialise.log file. Obviously, I prepared the orf bed file from mikado_prepared.fasta using transdecoder. I am using version v2.0rc6

2020-06-04 14:50:10,397 - serialiser - serialise.py:290 - INFO - setup - MainProcess - Command line: /home/bbista/.local/bin/mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs mikado_prepared.fasta.transdecoder.bed --blast_targets /work/LAS/nvalenzu-lab/bbista/uniprot/CPIprotein.faa -p 16 --transcripts mikado_prepared.fasta 2020-06-04 14:50:10,398 - serialiser - serialise.py:296 - INFO - setup - MainProcess - Random seed: 985109527 2020-06-04 14:50:10,398 - serialiser - serialise.py:334 - INFO - setup - MainProcess - Using a sqlite database (location: /work/LAS/nvalenzu-lab/bbista/STRannot/mikado.db) 2020-06-04 14:50:10,398 - serialiser - serialise.py:338 - INFO - setup - MainProcess - Requested 16 threads, forcing single thread: False 2020-06-04 14:50:10,398 - serialiser - serialise.py:140 - INFO - load_orfs - MainProcess - Starting to load ORF data 2020-06-04 14:50:19,982 - serialiser - orf.py:419 - CRITICAL - __serialize_multiple_threads - MainProcess - The provided ORFs do not match the transcripts provided and already present in the database.This could be due to having called the ORFs on a FASTA file different from mikado_prepared.fasta, the output of mikado prepare. If this is the case, please use mikado_prepared.fasta to call the ORFs and then restart mikado serialise using them as input. 2020-06-04 14:50:19,983 - serialiser - serialise.py:149 - CRITICAL - load_orfs - MainProcess - Mikado serialise failed due to problems with the input data. Please check the logs.

lucventurini commented 4 years ago

Dear @bbista

Many thanks for reporting this. May I ask which version are you using for Mikado? The line of the error (419) does not match with my current version of the code. I suppose you have installed it from Conda/PyPI?

I am asking this because the 2.0rc2 should have that message on line 449, and the file was changed in this sense over a year ago.

If this is not the case, may I politely ask you ro retry with the Conda/PyPI version? It should have the correct code.

Kind regards

bbista commented 4 years ago

Dear @lucventurini I used v2.0rc6. I re-ran serialise using v2.0rc2. I got the same error.

2020-06-08 12:00:58,237 - serialiser - serialise.py:290 - INFO - setup - MainProcess - Command line: /home/bbista/.local/bin/mikado serialise --json-conf configuration.yaml --xml mikado.blast.xml.gz --orfs mikado_prepared.fasta.transdecoder.bed --blast_targets /work/LAS/nvalenzu-lab/bbista/uniprot/CPIprotein.faa 2020-06-08 12:00:58,238 - serialiser - serialise.py:296 - INFO - setup - MainProcess - Random seed: 985109527 2020-06-08 12:00:58,446 - serialiser - serialise.py:334 - INFO - setup - MainProcess - Using a sqlite database (location: /work/LAS/nvalenzu-lab/bbista/STRannot/mikado.db) 2020-06-08 12:00:58,446 - serialiser - serialise.py:338 - INFO - setup - MainProcess - Requested 4 threads, forcing single thread: False 2020-06-08 12:00:58,446 - serialiser - serialise.py:140 - INFO - load_orfs - MainProcess - Starting to load ORF data 2020-06-08 12:00:58,551 - serialiser - orf.py:448 - CRITICAL - __serialize_multiple_threads - MainProcess - The provided ORFs do not match the transcripts provided and already present in the database.This could be due to having called the ORFs on a FASTA file different frommikado_prepared.fasta, the output of mikado prepare. If this is the case, please use mikado_prepared.fasta to call the ORFs and then restartmikado serialiseusing them as input. 2020-06-08 12:00:58,551 - serialiser - serialise.py:149 - CRITICAL - load_orfs - MainProcess - Mikado serialise failed due to problems with the input data. Please check the logs. 2020-06-08 12:00:58,544 - Bed12ParseWrapper-2 - bed12.py:1704 - INFO - run - Bed12ParseWrapper-2 - Started 0 2020-06-08 12:00:58,547 - Bed12ParseWrapper-3 - bed12.py:1704 - INFO - run - Bed12ParseWrapper-3 - Started 1 2020-06-08 12:00:58,550 - Bed12ParseWrapper-4 - bed12.py:1704 - INFO - run - Bed12ParseWrapper-4 - Started 2 2020-06-08 12:00:58,552 - Bed12ParseWrapper-5 - bed12.py:1704 - INFO - run - Bed12ParseWrapper-5 - Started 3 I am not sure about the Conda/PyPI version since I have had problems with that. (see #244)

Best, Basanta

lucventurini commented 3 years ago

Closing for now, as the code is now very different from one year ago.