EI-CoreBioinformatics / mikado

Mikado is a lightweight Python3 pipeline whose purpose is to facilitate the identification of expressed loci from RNA-Seq data * and to select the best models in each locus.
https://mikado.readthedocs.io/en/stable/
GNU Lesser General Public License v3.0
93 stars 18 forks source link

running error of mikado #265

Closed juntaosdu closed 4 years ago

juntaosdu commented 4 years ago

Dear authors,

Recently I was using your tool Mikado on my data. Below is my running commands:

mikado configure --list list.txt --reference genome.fa configuration.yaml mikado prepare --json-conf configuration.yaml mikado serialise --json-conf configuration.yaml mikado pick --json-conf configuration.yaml

When I finished the mikado serialise, it did not generate the file mikado.db. Then ran mikado pick and it generated the following info: --- Logging error --- Traceback (most recent call last): File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 992, in emit msg = self.format(record) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 838, in format return fmt.format(record) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 575, in format record.message = record.getMessage() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 338, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/bin mikado", line 8, in sys.exit(main()) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/site-packages/Mikado/init.py", line 106, in main args.func(args) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/site-packages/Mikado/subprograms/prepare.py", line 165, in prepare launcher prepare(args, logger) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/site-packages/Mikado/preparation/prepare.py", line 428, in prepare perform_check(sorter(shelf_stacks), shelf_stacks, args, logger) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/site-packages/Mikado/preparation/prepare.py", line 165, in perform check strand_specific=tobj["strandspecific"]) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/site-packages/Mikado/preparation/checking.py", line 78, in create ranscript transcript_object.check_strand() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/site-packages/Mikado/transcripts/transcriptchecker.py", line 216, n check_strand canonical_counter["-"]) Message: 'Transcript %s has been assigned to the wrong strand, tagging it bu leaving it on this strand.' Arguments: ('sca_bundle.18040.0.4', 0, 1) ......

Could you please help handle this?

Thank you very much!

juntaosdu commented 4 years ago

AttributeError exists when I try the latest version.

lucventurini commented 4 years ago

Dear @juntaosdu , many thanks for reporting this. Would you please be able to attach here the logs of all steps (prepare, serialsie, pick)? Unfortunately the log you paste above is truncated, preventing us from understanding where the problem lies.

Kind regards

juntaosdu commented 4 years ago

=======Below is the output information while mikado was running====== /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) 2020-02-10 03:56:33,662 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause: 2020-02-10 03:56:33,662 - main - init.py:125 - ERROR - main - MainProcess - 'DiGraph' object has no attribute 'add_path' Traceback (most recent call last): File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/init.py", line 110, in main args.func(args) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/subprograms/pick.py", line 152, in pick creator() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/picking/picker.py", line 1156, in call self._parse_and_submit_input(data_dict) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/picking/picker.py", line 1129, in _parse_and_submit_input self.__submit_single_threaded(data_dict) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/picking/picker.py", line 1053, in submit_single_threaded source=self.json_conf["pick"]["output_format"]["source"]) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/loci/superlocus.py", line 149, in init__ super().add_transcript_to_locus(transcript_instance) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/loci/abstractlocus.py", line 440, in add_transcript_to_locus self.add_path_to_graph(transcript, self._internal_graph) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/loci/abstractlocus.py", line 1154, in add_path_to_graph graph.add_path(segments) AttributeError: 'DiGraph' object has no attribute 'add_path'

======Below are the log files======

  1. prepare.log: 2020-02-10 03:56:24,567 - prepare - prepare.py:67 - INFO - setup - MainProcess - Command line: /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/bin/mikado prepare --json-conf configuration.yaml 2020-02-10 03:56:26,214 - prepare - prepare.py:447 - INFO - prepare - MainProcess - Finished

  2. serialise.log: 2020-02-10 03:56:29,646 - serialiser - serialise.py:268 - INFO - setup - MainProcess - Command line: /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/bin/mikado serialise --json-conf configuration.yaml

  3. mikado_pick.log: 2020-02-10 03:56:33,167 - main_logger - picker.py:320 - INFO - setup_logger - MainProcess - Begun analysis of mikado_prepared.gtf 2020-02-10 03:56:33,168 - main_logger - picker.py:322 - INFO - setup_logger - MainProcess - Command line: /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/bin/mikado pick --json-conf configuration.yaml 2020-02-10 03:56:33,168 - main_logger - picker.py:236 - INFO - setup_shm_db - MainProcess - Copy into a SHM db: False 2020-02-10 03:56:33,169 - listener - picker.py:339 - WARNING - setup_logger - MainProcess - Current level for queue: WARNING 2020-02-10 03:56:33,444 - listener - dbutils.py:54 - WARNING - create_connector - MainProcess - No database found, creating a mock one!

lucventurini commented 4 years ago

Dear @juntaosdu , many thanks. The bug is triggered by changes in a library package that Mikado uses. Our fault, we should have updated the code. I will be issuing a bug fix within the next hour, please watch this issue.

Kind regards

juntaosdu commented 4 years ago

By the way, is it normal that mikado serialise did not generate the mikado.db by my running commands?

lucventurini commented 4 years ago

Dear @juntaosdu , on reviewing, we fixed this specific issue over a year ago (see here).

I think that the issue is that the last official release of Mikado in PyPI is very old, as we are struggling to finalise version 2. This is starting to cause problems like the one you highlighted, and it is on us.

If I may, the best solution I can recommend to solve the issue is as follows:

conda create -n mikado1 -c bioconda -- mikado==1.0.2

Please see next comment for your last question.

lucventurini commented 4 years ago

By the way, is it normal that mikado serialise did not generate the mikado.db by my running commands?

No it is not, but I think that the problem here is that mikado serialise does need additional data (junctions, blast, ORFs) which has to be generated between prepare and serialise.

For this reason, we wrote a pipeline wrapper (based on SnakeMake) to perform all the steps (see here for a tutorial).

Briefly I would do the following, after installing:

mikado configure --daijin --list list.txt --reference genome.fa --blast_targets <PROTEINS SIMILAR TO YOUR ORGANISM>  configuration.yaml 
daijin -nd configuration.yaml 

If you are using the latest version from here on GitHub, please consider the possibility of using: daijin --use-conda -nd configuration.yaml

as this will ensure that the required software is installed.

I hope this helps.

juntaosdu commented 4 years ago

It generates the followings when I type "daijin -nd configuration.yaml" usage: A Directed Acyclic pipeline for gene model reconstruction from RNA seq data. Basically, a pipeline for driving Mikado. It will first align RNAseq reads against a genome using multiple tools, then creates transcript assemblies using multiple tools, and find junctions in the alignments using Portcullis. This input is then passed into Mikado. [-h] {configure,assemble,mikado} ... A Directed Acyclic pipeline for gene model reconstruction from RNA seq data. Basically, a pipeline for driving Mikado. It will first align RNAseq reads against a genome using multiple tools, then creates transcript assemblies using multiple tools, and find junctions in the alignments using Portcullis. This input is then passed into Mikado.: error: invalid choice: 'configuration.yaml' (choose from 'configure', 'assemble', 'mikado')

lucventurini commented 4 years ago

My fault, it should have been:

daijin mikado -nd configuration.yaml

juntaosdu commented 4 years ago

Again the following information:

Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 blast_all 1 genome_index 1 mikado_collect_stats 1 mikado_pick 1 mikado_prepare 1 mikado_serialise 1 mikado_stats 1 prodigal 9

Job 5: Preparing transcripts using mikado

mikado prepare --start-method=spawn --procs=1 --fasta=genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadoakkicyfp.yaml -od Daijin/5-mikado 2>&1 rule blast_all: output: Daijin/5-mikado/blast/blastx.all.done jobid: 6

/bin/bash: mikado: command not found touch Daijin/5-mikado/blast/blastx.all.done Job 8: Using samtools to index genome

ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/genome.fa Daijin/5-mikado/genome.fa && touch -h Daijin/5-mikado/genome.fa && samtools faidx Daijin/5-mikado/genome.fa /bin/bash: samtools: command not found Finished job 6. 1 of 9 steps (11%) done Error in job mikado_prepare while creating output files Daijin/5-mikado/mikado_prepared.gtf, Daijin/5-mikado/mikado_prepared.fasta. Error in job genome_index while creating output file Daijin/5-mikado/genome.fa.fai. RuleException: CalledProcessError in line 86 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command ' mikado prepare --start-method=spawn --procs=1 --fasta=genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadoakkicyfp.yaml -od Daijin/5-mikado 2>&1' returned non-zero exit status 127. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 86, in rule_mikado_prepare File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run RuleException: CalledProcessError in line 230 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command 'ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/genome.fa Daijin/5-mikado/genome.fa && touch -h Daijin/5-mikado/genome.fa && samtools faidx Daijin/5-mikado/genome.fa' returned non-zero exit status 127. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 230, in rule_genome_index File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

lucventurini commented 4 years ago

Dear @juntaosdu , something has gone wrong in installing Mikado. The errors:

/bin/bash: mikado: command not found
/bin/bash: samtools: command not found

indicate that samtools and mikado are not available when running the pipeline. Especially for mikado, this is puzzling.

May I ask exactly which method of installation (and which version of Mikado) you are using, so that I can try to replicate here?

juntaosdu commented 4 years ago

./bin/pip3 install mikado

juntaosdu commented 4 years ago

Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 genome_index 1 mikado_collect_stats 1 mikado_pick 1 mikado_prepare 1 mikado_serialise 1 mikado_stats 1 prodigal 8

Job 6: Using samtools to index genome

ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/genome.fa Daijin/5-mikado/genome.fa && touch -h Daijin/5-mikado/genome.fa && samtools faidx Daijin/5-mikado/genome.fa Job 4: Preparing transcripts using mikado

mikado prepare --start-method=spawn --procs=1 --fasta=genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovj4j6zgx.yaml -od Daijin/5-mikado 2>&1 Finished job 6. 1 of 8 steps (12%) done /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) Finished job 4. 2 of 8 steps (25%) done

Job 8: Running PRODIGAL on Mikado prepared transcripts: Daijin/5-mikado/mikado_prepared.fasta

mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1 Error in job prodigal while creating output file Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3. RuleException: CalledProcessError in line 220 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command ' mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1' returned non-zero exit status 127. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 220, in __rule_prodigal File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

lucventurini commented 4 years ago

Dear @juntaosdu , this is progress ... the program missing is https://github.com/hyattpd/Prodigal. Once installed, Daijin should have all that is needed to run all the steps of Mikado.

juntaosdu commented 4 years ago

Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 mikado_collect_stats 1 mikado_pick 1 mikado_serialise 1 mikado_stats 1 prodigal 6

Job 6: Running PRODIGAL on Mikado prepared transcripts: Daijin/5-mikado/mikado_prepared.fasta

mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1 Finished job 6. 1 of 6 steps (17%) done

Job 4: Running Mikado serialise to move numerous data sources into a single database

mikado serialise --start-method=spawn --transcripts=Daijin/5-mikado/mikado_prepared.fasta --genome_fai=Daijin/5-mikado/genome.fa.fai --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovfafdkp3.yaml --force --orfs=Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3 -od Daijin/5-mikado --procs=1 > Daijin/5-mikado/mikado_serialise.err 2>&1 Finished job 4. 2 of 6 steps (33%) done

Job 3: Running mikado picking stage

mikado pick --source Mikado_permissive --mode=permissive --procs=1 --start-method=spawn --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovfafdkp3.yaml -od Daijin/5-mikado/pick/permissive --loci_out mikado-permissive.loci.gff3 -lv INFO Daijin/5-mikado/mikado_prepared.gtf -db Daijin/5-mikado/mikado.db > Daijin/5-mikado/pick/permissive/mikado-permissive.pick.err 2>&1 Error in job mikado_pick while creating output file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3. RuleException: CalledProcessError in line 269 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command ' mikado pick --source Mikado_permissive --mode=permissive --procs=1 --start-method=spawn --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovfafdkp3.yaml -od Daijin/5-mikado/pick/permissive --loci_out mikado-permissive.loci.gff3 -lv INFO Daijin/5-mikado/mikado_prepared.gtf -db Daijin/5-mikado/mikado.db > Daijin/5-mikado/pick/permissive/mikado-permissive.pick.err 2>&1' returned non-zero exit status 1. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 269, in __rule_mikado_pick File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run Removing output files of failed job mikado_pick since they might be corrupted: Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3 Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

lucventurini commented 4 years ago

Dear @juntaosdu , could you please attach (or copy the contents here) of Daijin/5-mikado/pick/permissive/mikado-permissive.pick.err ?

Details on the crash should be there.

juntaosdu commented 4 years ago

I reinstalled mikado via conda, and it seems correct. Then I ran daijin mikado -nd configuration.yaml, it is also correct. Then I ran mikado prepare --json-conf configuration.yaml and mikado serialise --json-conf configuration.yaml, and still there is no file mikado.db. Then I ran mikado pick --json-conf configuration.yaml, and the following: --- Logging error --- Traceback (most recent call last): File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 1034, in emit msg = self.format(record) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 880, in format return fmt.format(record) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 619, in format record.message = record.getMessage() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 380, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/bin/mikado", line 8, in sys.exit(main()) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/init.py", line 106, in main args.func(args) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/subprograms/pick.py", line 201, in pick args = check_run_options(args, logger=logger) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/subprograms/pick.py", line 103, in check_run_options logger.critical("Mikado database {} not found. Exiting.", args.sqlite_db) Message: 'Mikado database {} not found. Exiting.' Arguments: (None,)

juntaosdu commented 4 years ago

However, I found a file in "Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3". Is this the final output assembly by mikado?

juntaosdu commented 4 years ago

Trouble again! When I switched to another dataset , it got the following error: Building DAG of jobs... Creating conda environment /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/samtools.yaml... Downloading and installing remote packages. Environment for ../../../../../../../../storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/samtools.yaml created (location: .snakemake/conda/c618eb7f) Creating conda environment /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/prodigal.yaml... Downloading and installing remote packages. Environment for ../../../../../../../../storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/prodigal.yaml created (location: .snakemake/conda/8d2bd5fa) Using shell: /bin/bash Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 blast_all 1 genome_index 1 mikado_collect_stats 1 mikado_pick 1 mikado_prepare 1 mikado_serialise 1 mikado_stats 1 prodigal 9

[Mon Feb 10 07:44:41 2020] Job 4: Preparing transcripts using mikado Reason: Missing output files: Daijin/5-mikado/mikado_prepared.fasta, Daijin/5-mikado/mikado_prepared.gtf

mikado prepare -l Daijin/5-mikado/mikado_prepare.log --start-method=spawn --fasta=/storage/juntaosdu/yuting/reference_gtf/Human_genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/mikadols9s9bb5.yaml -od Daijin/5-mikado 2>&1

[Mon Feb 10 07:44:41 2020] rule blast_all: output: Daijin/5-mikado/blast/blastx.all.done jobid: 6 reason: Missing output files: Daijin/5-mikado/blast/blastx.all.done

touch Daijin/5-mikado/blast/blastx.all.done

[Mon Feb 10 07:44:41 2020] Job 8: Using samtools to index genome Reason: Missing output files: Daijin/5-mikado/Human_genome.fa.fai

ln -sf /storage/juntaosdu/yuting/reference_gtf/Human_genome.fa Daijin/5-mikado/Human_genome.fa && touch -h Daijin/5-mikado/Human_genome.fa && samtools faidx Daijin/5-mikado/Human_genome.fa [Mon Feb 10 07:44:41 2020] Finished job 6. 1 of 9 steps (11%) done Activating conda environment: /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/.snakemake/conda/c618eb7f [Mon Feb 10 07:44:58 2020] Finished job 8. 2 of 9 steps (22%) done [Mon Feb 10 07:47:07 2020] Finished job 4. 3 of 9 steps (33%) done

[Mon Feb 10 07:47:07 2020] Job 7: Running PRODIGAL on Mikado prepared transcripts: Daijin/5-mikado/mikado_prepared.fasta Reason: Missing output files: Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3; Input files updated by another job: Daijin/5-mikado/mikado_prepared.fasta

mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1 Activating conda environment: /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/.snakemake/conda/8d2bd5fa [Mon Feb 10 07:48:49 2020] Finished job 7. 4 of 9 steps (44%) done

[Mon Feb 10 07:48:49 2020] rule mikado_serialise: input: Daijin/5-mikado/blast/blastx.all.done, Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3, Daijin/5-mikado/Human_genome.fa.fai, Daijin/5-mikado/mikado_prepared.fasta output: Daijin/5-mikado/mikado.db log: Daijin/5-mikado/mikado_serialise.log jobid: 5 reason: Missing output files: Daijin/5-mikado/mikado.db; Input files updated by another job: Daijin/5-mikado/mikado_prepared.fasta, Daijin/5-mikado/Human_genome.fa.fai, Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3, Daijin/5-mikado/blast/blastx.all.done

mikado serialise --start-method=spawn --transcripts=Daijin/5-mikado/mikado_prepared.fasta --genome_fai=Daijin/5-mikado/Human_genome.fa.fai --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/mikadols9s9bb5.yaml --force --orfs=Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3 -od Daijin/5-mikado --procs=1 -l Daijin/5-mikado/mikado_serialise.log 2020-02-10 07:49:07,204 - main - init.py:120 - ERROR - main - MainProcess - Mikado crashed, cause: 2020-02-10 07:49:07,204 - main - init.py:121 - ERROR - main - MainProcess - returned NULL without setting an error Traceback (most recent call last): File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/init.py", line 106, in main args.func(args) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/subprograms/serialise.py", line 379, in serialise load_orfs(args, logger) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/subprograms/serialise.py", line 147, in load_orfs serializer() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/serializers/orf.py", line 477, in call self.serialize() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/serializers/orf.py", line 467, in serialize self.serialize_single_thread() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/serializers/orf.py", line 306, in serialize_single_thread for row in self.bed12_parser: File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 1360, in next return self.gff_next() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/bed12.py", line 1394, in gff_next line = GffLine(line) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/GFF.py", line 36, in init GFAnnotation.init(self, line, my_line, header=header) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/gfannotation.py", line 92, in init self.start, self.end = tuple(fastnumbers.fast_int(i) for i in self._fields[3:5]) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/gfannotation.py", line 92, in self.start, self.end = tuple(fastnumbers.fast_int(i) for i in self._fields[3:5]) SystemError: returned NULL without setting an error [Mon Feb 10 07:49:07 2020] Error in rule mikado_serialise: jobid: 5 output: Daijin/5-mikado/mikado.db log: Daijin/5-mikado/mikado_serialise.log (check log file(s) for error message) shell: mikado serialise --start-method=spawn --transcripts=Daijin/5-mikado/mikado_prepared.fasta --genome_fai=Daijin/5-mikado/Human_genome.fa.fai --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/mikadols9s9bb5.yaml --force --orfs=Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3 -od Daijin/5-mikado --procs=1 -l Daijin/5-mikado/mikado_serialise.log (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job mikado_serialise since they might be corrupted: Daijin/5-mikado/mikado.db Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/.snakemake/log/2020-02-10T074413.461781.snakemake.log Error: cannot open input file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3!

juntaosdu commented 4 years ago

How can I get the correct result? Thank you very much!

juntaosdu commented 4 years ago

When I tried the version 1.0, it did not report any error and output two gff files. It seems good, but it did not generate the mikado.db file. However, the mikado pick would generate a sentence "No database found, creating a mock one!". Would that be OK? Can I use these generated files? Thank you!

lucventurini commented 4 years ago

When I tried the version 1.0, it did not report any error and output two gff files. It seems good, but it did not generate the mikado.db file. However, the mikado pick would generate a sentence "No database found, creating a mock one!". Would that be OK? Can I use these generated files? Thank you!

Dear @juntaosdu , no, unfortunately those files will not be useful as the Mikado database is not present and contains important information.

Please allow me to try to summarise what happened so far, correct me whenever I am understanding incorrectly:

The last bit is puzzling. Could you please check whether the file

Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3

has any line not starting with a # where the fourth or the fifth column are not integers? That's what triggering the error.

juntaosdu commented 4 years ago

Hi, I have checked the .gff3 file and found that each line not starting with a # where both its fourth and fifthe columns are integers. Anything esle needs me to do?

lucventurini commented 4 years ago

Dear @juntaosdu , thank you for trying that. I have a further hypothesis, ie, nothing is wrong with the GFF or Mikado, but this is a bug (fixed upstream) in the fastnumbers package. Could you please try to:

pip install -U "fastnumbers>=3.0"

within the mikado2 conda environment?

I am also considering removing completely this dependency and go back to use the included functions within Python3, the author of the library himself reported that recent improvements make his project a little bit redundant with Python3.7 and over.

juntaosdu commented 4 years ago

I tried pip install -U "fastnumbers>=3.0" and it is successfully installed. However, when I run daijin mikado -nd configuration.yaml, I received the same error report. What should I do next?

lucventurini commented 4 years ago

Dear @juntaosdu , many thanks for trying this.

I am currently puzzled by what is actually happening. I think that the best way forward would be for you to send me, privately, your files so I can diagnose the issue more quickly. My email is lucventurini AT gmail DOT com. I would need the configuration file, the mikado_prepared.fasta and the Prodigal GFF3 for this. With those, I should be able to diagnose what is triggering the bug. I will treat the data confidentially and delete it after diagnosing and fixing the issue.

As an alternative solution or workaround, you can change the following field in the configuration file from false to true: use_transdecoder: false , delete the prodigal folder, then relaunch daijin. This will use TransDecoder rather than Prodigal, which has a different output format (BED12), and therefore should not trigger this bug at all.

If you would like instead to help diagnose and fix this bug for good, there are a couple of things that can be done:

Many thanks for your patience.

juntaosdu commented 4 years ago

It is running and seems good. By the way, the file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3 is the final output file?

lucventurini commented 4 years ago

the file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3 is the final output file?

Yes, those are the final selected transcript. Glad to hear that it is managing to get past the error.

May I ask what option did you choose to pursue (TransDecoder, or the bug fixings that I suggested in the bullet points)? This would help pin down the bug.

Kind regards

juntaosdu commented 4 years ago

bug fixings.

lucventurini commented 4 years ago

Thank you. If you could check the serialise.log after the step has finished, to check which lines are creating an issue (if any) I would be grateful.

I will probably remove fastnumbers from that step. You are not the first to report a bug there.

Thank you.

juntaosdu commented 4 years ago

How to check the file serialise.log?

lucventurini commented 4 years ago

I meant this file within the working directory: Daijin/5-mikado/mikado_serialise.log

Apologies for not having been clearer.

juntaosdu commented 4 years ago

It has finished running on my simulated dataset successfully! I used cuffcompare to evaluate its performance, and results showed that it performed really bad. e.g. StringTie correctly assembled 10091 transcripts, while the number of Mikao is only 4411. Is there anything wrong?

juntaosdu commented 4 years ago

I ran Mikado by combining the assemblies from StringTie, StringTie2, Scallop, and Cufflinks.

lucventurini commented 4 years ago

Dear @juntaosdu , for starters, glad to hear that the run finished successfully!

Now, regarding performance ... I am sorry to hear that it is far lower than you expected. However, I would need some more details on this to help. In order:

On a more general note, Mikado tries to recover transcripts that are protein coding or at most lncRNAs. When we simulated our data for H. sapiens, for example, we excluded many biotypes from the simulation. We also do not consider the original alignments, and we did not define correctness in terms of the congruence with the original data. Specifically, if at a locus the most expressed transcript is a retained intron event that breaks the CDS, with very little expression for a transcript with the correct splicing ... Mikado will try to bring back the correct transcript without the retained intron, even if it has far less read expression support.

I have seen your TransBorrow tool on SourceForge (it looks like a very interesting project!), but it seems to have a very different phylosophy, ie it seems (to my eye) to try to reconstruct the best transcripts given the alignment data. This is very valuable but quite different from what Mikado tries to do (ie trying to recover good gene models from the various options, regardless of which one has the most read support, starting from a preconception of how a gene model should look like).

juntaosdu commented 4 years ago

What should I do if I want to test Mikado on the human RNA-seq data by only using the assemblies from other assemblers like stringtie?

lucventurini commented 4 years ago

Dear @juntaosdu , then I would do the following, editing the configuration file:

I would stress though that Mikado is meant to integrate BLAST and junction analysis results together with the ORF data, it will not operate at its best without those.

juntaosdu commented 4 years ago

Thank you very much for the helpful advice!

swarbred commented 4 years ago

HI @juntaosdu As luca indicated Mikado probably isn't directly comparable to your TransBorrow tool, It's not clear from the above comments how you are running mikado, the serialise step integrates multiple data sources i.e. ORFs (prodigal), alignments (diamond) and junctions (portcullis), while you can functionally run mikado with just one of these inputs you would not be advised to and the results will be poor. If you have multiple transcript assemblers for human then the fair use of mikado would be to read https://mikado.readthedocs.io/en/latest/Tutorial/index.html use the mammalian.yaml scoring 1: run mikado prepare 2: run prodigal (prodigal -g 1) on the mikado prepare fasta 3: create a mammalian diamond db of a few related species e.g. mouse, chimp etc and align the mikado prepare fasta to that. 4: run portcullis on the bam of the aligned RNA-Seq reads to get the bed junctions file that pass filters 5: run mikado serialise with the above inputs 6: run mikado pick

Mikado is a framework that allows you to select from a pool of transcripts based on user requirements it's not designed to be run with only transcript assemblies and no additional data.

swarbred commented 4 years ago

Also @juntaosdu

If you run without junctions NOT ADVISED then you need to update the mammalian.yaml scoring file as we only retain transcripts with some junction support i.e. remove the proportion_verified_introns_inlocus bits from the requirements section

expression: [(combined_cds_fraction.ncrna or combined_cds_fraction.coding) and ((exon_num.multi and (cdna_length.multi or combined_cds_length.multi) and max_intron_length and min_intron_length and proportion_verified_introns_inlocus), or, (exon_num.mono and (combined_cds_length.mono or cdna_length.mono)))]

proportion_verified_introns_inlocus: {operator: gt, value: 0}
juntaosdu commented 4 years ago

thank you

juntaosdu commented 4 years ago

Hi, Could you please show me the detailed running commands of each of the above steps?

juntaosdu commented 4 years ago

You mentioned that "remove the proportion_verified_introns_inlocus bits from the requirements section". How to do this?

swarbred commented 4 years ago

Hi @juntaosdu https://mikado.readthedocs.io/en/latest/Tutorial/index.html covers the mikado steps

start with mikado configure i.e. first generate the list.txt file

mikado configure --list list.txt --reference myref.fa --mode stringent --scoring mammalian.yaml  --copy-scoring mammalian.yaml configuration.toml

then

mikado prepare -p <number of processes> --json-conf configuration.toml

You mentioned that "remove the proportion_verified_introns_inlocus bits from the requirements section". How to do this?

this relates to the scoring file mammalian.yaml that mikado configure will have copied to your working directory. BUT this only needs to be changed if you DONT generate the junctions

if you run pick without junctions then change the top section of the scoring file to

  # Scoring file suitable for any species with intron sizes similar to mammals
requirements:
  expression: [(combined_cds_fraction.ncrna or combined_cds_fraction.coding) and ((exon_num.multi and (cdna_length.multi or combined_cds_length.multi) and max_intron_length and min_intron_length), or, (exon_num.mono and (combined_cds_length.mono or cdna_length.mono)))]
  parameters:
    combined_cds_fraction.ncrna: {operator: eq, value: 0}
    combined_cds_fraction.coding: {operator: gt, value: 0.30}
    cdna_length.mono: {operator: gt, value: 400}
    cdna_length.multi: {operator: ge, value: 300}
    combined_cds_length.mono: {operator: gt, value: 225}
    combined_cds_length.multi: {operator: gt, value: 150}
    exon_num.mono: {operator: eq, value: 1}
    exon_num.multi: {operator: gt, value: 1}
    max_intron_length: {operator: le, value: 1000000}
    min_intron_length: {operator: ge, value: 5}

i.e. remove the requirement for transcripts to have junction support (AGAIN NOT ADVISED)