Closed juntaosdu closed 4 years ago
AttributeError exists when I try the latest version.
Dear @juntaosdu , many thanks for reporting this.
Would you please be able to attach here the logs of all steps (prepare
, serialsie
, pick
)? Unfortunately the log you paste above is truncated, preventing us from understanding where the problem lies.
Kind regards
=======Below is the output information while mikado was running====== /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) 2020-02-10 03:56:33,662 - main - init.py:124 - ERROR - main - MainProcess - Mikado crashed, cause: 2020-02-10 03:56:33,662 - main - init.py:125 - ERROR - main - MainProcess - 'DiGraph' object has no attribute 'add_path' Traceback (most recent call last): File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/init.py", line 110, in main args.func(args) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/subprograms/pick.py", line 152, in pick creator() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/picking/picker.py", line 1156, in call self._parse_and_submit_input(data_dict) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/picking/picker.py", line 1129, in _parse_and_submit_input self.__submit_single_threaded(data_dict) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/picking/picker.py", line 1053, in submit_single_threaded source=self.json_conf["pick"]["output_format"]["source"]) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/loci/superlocus.py", line 149, in init__ super().add_transcript_to_locus(transcript_instance) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/loci/abstractlocus.py", line 440, in add_transcript_to_locus self.add_path_to_graph(transcript, self._internal_graph) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/loci/abstractlocus.py", line 1154, in add_path_to_graph graph.add_path(segments) AttributeError: 'DiGraph' object has no attribute 'add_path'
======Below are the log files======
prepare.log: 2020-02-10 03:56:24,567 - prepare - prepare.py:67 - INFO - setup - MainProcess - Command line: /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/bin/mikado prepare --json-conf configuration.yaml 2020-02-10 03:56:26,214 - prepare - prepare.py:447 - INFO - prepare - MainProcess - Finished
serialise.log: 2020-02-10 03:56:29,646 - serialiser - serialise.py:268 - INFO - setup - MainProcess - Command line: /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/bin/mikado serialise --json-conf configuration.yaml
mikado_pick.log: 2020-02-10 03:56:33,167 - main_logger - picker.py:320 - INFO - setup_logger - MainProcess - Begun analysis of mikado_prepared.gtf 2020-02-10 03:56:33,168 - main_logger - picker.py:322 - INFO - setup_logger - MainProcess - Command line: /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/bin/mikado pick --json-conf configuration.yaml 2020-02-10 03:56:33,168 - main_logger - picker.py:236 - INFO - setup_shm_db - MainProcess - Copy into a SHM db: False 2020-02-10 03:56:33,169 - listener - picker.py:339 - WARNING - setup_logger - MainProcess - Current level for queue: WARNING 2020-02-10 03:56:33,444 - listener - dbutils.py:54 - WARNING - create_connector - MainProcess - No database found, creating a mock one!
Dear @juntaosdu , many thanks. The bug is triggered by changes in a library package that Mikado uses. Our fault, we should have updated the code. I will be issuing a bug fix within the next hour, please watch this issue.
Kind regards
By the way, is it normal that mikado serialise did not generate the mikado.db by my running commands?
Dear @juntaosdu , on reviewing, we fixed this specific issue over a year ago (see here).
I think that the issue is that the last official release of Mikado in PyPI is very old, as we are struggling to finalise version 2. This is starting to cause problems like the one you highlighted, and it is on us.
If I may, the best solution I can recommend to solve the issue is as follows:
Conda
and the following command:conda create -n mikado1 -c bioconda -- mikado==1.0.2
git clone https://github.com/EI-CoreBioinformatics/mikado.git
cd mikado
conda env create -f environment.yml # This will create a "mikado2" environment
conda activate mikado2
python setup.py bdist_wheel
pip install dist/*whl
Please see next comment for your last question.
By the way, is it normal that mikado serialise did not generate the mikado.db by my running commands?
No it is not, but I think that the problem here is that mikado serialise
does need additional data (junctions, blast, ORFs) which has to be generated between prepare
and serialise
.
For this reason, we wrote a pipeline wrapper (based on SnakeMake) to perform all the steps (see here for a tutorial).
Briefly I would do the following, after installing:
mikado configure --daijin --list list.txt --reference genome.fa --blast_targets <PROTEINS SIMILAR TO YOUR ORGANISM> configuration.yaml
daijin -nd configuration.yaml
If you are using the latest version from here on GitHub, please consider the possibility of using:
daijin --use-conda -nd configuration.yaml
as this will ensure that the required software is installed.
I hope this helps.
It generates the followings when I type "daijin -nd configuration.yaml" usage: A Directed Acyclic pipeline for gene model reconstruction from RNA seq data. Basically, a pipeline for driving Mikado. It will first align RNAseq reads against a genome using multiple tools, then creates transcript assemblies using multiple tools, and find junctions in the alignments using Portcullis. This input is then passed into Mikado. [-h] {configure,assemble,mikado} ... A Directed Acyclic pipeline for gene model reconstruction from RNA seq data. Basically, a pipeline for driving Mikado. It will first align RNAseq reads against a genome using multiple tools, then creates transcript assemblies using multiple tools, and find junctions in the alignments using Portcullis. This input is then passed into Mikado.: error: invalid choice: 'configuration.yaml' (choose from 'configure', 'assemble', 'mikado')
My fault, it should have been:
daijin mikado -nd configuration.yaml
Again the following information:
Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 blast_all 1 genome_index 1 mikado_collect_stats 1 mikado_pick 1 mikado_prepare 1 mikado_serialise 1 mikado_stats 1 prodigal 9
Job 5: Preparing transcripts using mikado
mikado prepare --start-method=spawn --procs=1 --fasta=genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadoakkicyfp.yaml -od Daijin/5-mikado 2>&1 rule blast_all: output: Daijin/5-mikado/blast/blastx.all.done jobid: 6
/bin/bash: mikado: command not found touch Daijin/5-mikado/blast/blastx.all.done Job 8: Using samtools to index genome
ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/genome.fa Daijin/5-mikado/genome.fa && touch -h Daijin/5-mikado/genome.fa && samtools faidx Daijin/5-mikado/genome.fa /bin/bash: samtools: command not found Finished job 6. 1 of 9 steps (11%) done Error in job mikado_prepare while creating output files Daijin/5-mikado/mikado_prepared.gtf, Daijin/5-mikado/mikado_prepared.fasta. Error in job genome_index while creating output file Daijin/5-mikado/genome.fa.fai. RuleException: CalledProcessError in line 86 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command ' mikado prepare --start-method=spawn --procs=1 --fasta=genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadoakkicyfp.yaml -od Daijin/5-mikado 2>&1' returned non-zero exit status 127. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 86, in rule_mikado_prepare File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run RuleException: CalledProcessError in line 230 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command 'ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/genome.fa Daijin/5-mikado/genome.fa && touch -h Daijin/5-mikado/genome.fa && samtools faidx Daijin/5-mikado/genome.fa' returned non-zero exit status 127. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 230, in rule_genome_index File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message
Dear @juntaosdu , something has gone wrong in installing Mikado. The errors:
/bin/bash: mikado: command not found
/bin/bash: samtools: command not found
indicate that samtools
and mikado
are not available when running the pipeline. Especially for mikado
, this is puzzling.
May I ask exactly which method of installation (and which version of Mikado) you are using, so that I can try to replicate here?
./bin/pip3 install mikado
Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 genome_index 1 mikado_collect_stats 1 mikado_pick 1 mikado_prepare 1 mikado_serialise 1 mikado_stats 1 prodigal 8
Job 6: Using samtools to index genome
ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/genome.fa Daijin/5-mikado/genome.fa && touch -h Daijin/5-mikado/genome.fa && samtools faidx Daijin/5-mikado/genome.fa Job 4: Preparing transcripts using mikado
mikado prepare --start-method=spawn --procs=1 --fasta=genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovj4j6zgx.yaml -od Daijin/5-mikado 2>&1 Finished job 6. 1 of 8 steps (12%) done /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/configuration/configurator.py:529: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. scoring = yaml.load(scoring_file) Finished job 4. 2 of 8 steps (25%) done
Job 8: Running PRODIGAL on Mikado prepared transcripts: Daijin/5-mikado/mikado_prepared.fasta
mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1 Error in job prodigal while creating output file Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3. RuleException: CalledProcessError in line 220 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command ' mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1' returned non-zero exit status 127. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 220, in __rule_prodigal File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message
Dear @juntaosdu , this is progress ... the program missing is https://github.com/hyattpd/Prodigal. Once installed, Daijin should have all that is needed to run all the steps of Mikado.
Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 mikado_collect_stats 1 mikado_pick 1 mikado_serialise 1 mikado_stats 1 prodigal 6
Job 6: Running PRODIGAL on Mikado prepared transcripts: Daijin/5-mikado/mikado_prepared.fasta
mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1 Finished job 6. 1 of 6 steps (17%) done
Job 4: Running Mikado serialise to move numerous data sources into a single database
mikado serialise --start-method=spawn --transcripts=Daijin/5-mikado/mikado_prepared.fasta --genome_fai=Daijin/5-mikado/genome.fa.fai --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovfafdkp3.yaml --force --orfs=Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3 -od Daijin/5-mikado --procs=1 > Daijin/5-mikado/mikado_serialise.err 2>&1 Finished job 4. 2 of 6 steps (33%) done
Job 3: Running mikado picking stage
mikado pick --source Mikado_permissive --mode=permissive --procs=1 --start-method=spawn --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovfafdkp3.yaml -od Daijin/5-mikado/pick/permissive --loci_out mikado-permissive.loci.gff3 -lv INFO Daijin/5-mikado/mikado_prepared.gtf -db Daijin/5-mikado/mikado.db > Daijin/5-mikado/pick/permissive/mikado-permissive.pick.err 2>&1 Error in job mikado_pick while creating output file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3. RuleException: CalledProcessError in line 269 of /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile: Command ' mikado pick --source Mikado_permissive --mode=permissive --procs=1 --start-method=spawn --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/Spkied-in/SRR3743144/Mikado/mikadovfafdkp3.yaml -od Daijin/5-mikado/pick/permissive --loci_out mikado-permissive.loci.gff3 -lv INFO Daijin/5-mikado/mikado_prepared.gtf -db Daijin/5-mikado/mikado.db > Daijin/5-mikado/pick/permissive/mikado-permissive.pick.err 2>&1' returned non-zero exit status 1. File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/site-packages/Mikado/daijin/mikado.snakefile", line 269, in __rule_mikado_pick File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local2/lib/python3.6/concurrent/futures/thread.py", line 55, in run Removing output files of failed job mikado_pick since they might be corrupted: Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3 Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message
Dear @juntaosdu , could you please attach (or copy the contents here) of
Daijin/5-mikado/pick/permissive/mikado-permissive.pick.err
?
Details on the crash should be there.
I reinstalled mikado via conda, and it seems correct.
Then I ran daijin mikado -nd configuration.yaml, it is also correct.
Then I ran mikado prepare --json-conf configuration.yaml and mikado serialise --json-conf configuration.yaml, and still there is no file mikado.db.
Then I ran mikado pick --json-conf configuration.yaml, and the following:
--- Logging error ---
Traceback (most recent call last):
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 1034, in emit
msg = self.format(record)
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 880, in format
return fmt.format(record)
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 619, in format
record.message = record.getMessage()
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/logging/init.py", line 380, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/bin/mikado", line 8, in
However, I found a file in "Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3". Is this the final output assembly by mikado?
Trouble again! When I switched to another dataset , it got the following error: Building DAG of jobs... Creating conda environment /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/samtools.yaml... Downloading and installing remote packages. Environment for ../../../../../../../../storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/samtools.yaml created (location: .snakemake/conda/c618eb7f) Creating conda environment /storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/prodigal.yaml... Downloading and installing remote packages. Environment for ../../../../../../../../storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/daijin/envs/prodigal.yaml created (location: .snakemake/conda/8d2bd5fa) Using shell: /bin/bash Provided cores: 1000 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1 blast_all 1 genome_index 1 mikado_collect_stats 1 mikado_pick 1 mikado_prepare 1 mikado_serialise 1 mikado_stats 1 prodigal 9
[Mon Feb 10 07:44:41 2020] Job 4: Preparing transcripts using mikado Reason: Missing output files: Daijin/5-mikado/mikado_prepared.fasta, Daijin/5-mikado/mikado_prepared.gtf
mikado prepare -l Daijin/5-mikado/mikado_prepare.log --start-method=spawn --fasta=/storage/juntaosdu/yuting/reference_gtf/Human_genome.fa --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/mikadols9s9bb5.yaml -od Daijin/5-mikado 2>&1
[Mon Feb 10 07:44:41 2020] rule blast_all: output: Daijin/5-mikado/blast/blastx.all.done jobid: 6 reason: Missing output files: Daijin/5-mikado/blast/blastx.all.done
touch Daijin/5-mikado/blast/blastx.all.done
[Mon Feb 10 07:44:41 2020] Job 8: Using samtools to index genome Reason: Missing output files: Daijin/5-mikado/Human_genome.fa.fai
ln -sf /storage/juntaosdu/yuting/reference_gtf/Human_genome.fa Daijin/5-mikado/Human_genome.fa && touch -h Daijin/5-mikado/Human_genome.fa && samtools faidx Daijin/5-mikado/Human_genome.fa [Mon Feb 10 07:44:41 2020] Finished job 6. 1 of 9 steps (11%) done Activating conda environment: /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/.snakemake/conda/c618eb7f [Mon Feb 10 07:44:58 2020] Finished job 8. 2 of 9 steps (22%) done [Mon Feb 10 07:47:07 2020] Finished job 4. 3 of 9 steps (33%) done
[Mon Feb 10 07:47:07 2020] Job 7: Running PRODIGAL on Mikado prepared transcripts: Daijin/5-mikado/mikado_prepared.fasta Reason: Missing output files: Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3; Input files updated by another job: Daijin/5-mikado/mikado_prepared.fasta
mkdir -p /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/prodigal && cd /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/prodigal && ln -sf /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/mikado_prepared.fasta transcripts.fasta && prodigal -f gff -g 1 -i transcripts.fasta -o transcripts.fasta.prodigal.gff3 > /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/Daijin/5-mikado/prodigal/prodigal.log 2>&1 Activating conda environment: /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/.snakemake/conda/8d2bd5fa [Mon Feb 10 07:48:49 2020] Finished job 7. 4 of 9 steps (44%) done
[Mon Feb 10 07:48:49 2020] rule mikado_serialise: input: Daijin/5-mikado/blast/blastx.all.done, Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3, Daijin/5-mikado/Human_genome.fa.fai, Daijin/5-mikado/mikado_prepared.fasta output: Daijin/5-mikado/mikado.db log: Daijin/5-mikado/mikado_serialise.log jobid: 5 reason: Missing output files: Daijin/5-mikado/mikado.db; Input files updated by another job: Daijin/5-mikado/mikado_prepared.fasta, Daijin/5-mikado/Human_genome.fa.fai, Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3, Daijin/5-mikado/blast/blastx.all.done
mikado serialise --start-method=spawn --transcripts=Daijin/5-mikado/mikado_prepared.fasta --genome_fai=Daijin/5-mikado/Human_genome.fa.fai --json-conf=/scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/mikadols9s9bb5.yaml --force --orfs=Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3 -od Daijin/5-mikado --procs=1 -l Daijin/5-mikado/mikado_serialise.log
2020-02-10 07:49:07,204 - main - init.py:120 - ERROR - main - MainProcess - Mikado crashed, cause:
2020-02-10 07:49:07,204 - main - init.py:121 - ERROR - main - MainProcess -
Removing output files of failed job mikado_serialise since they might be corrupted: Daijin/5-mikado/mikado.db Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /scrfs/storage/juntaosdu/yuting/TransBorrow-Revision/Data/S150/Mikado/.snakemake/log/2020-02-10T074413.461781.snakemake.log Error: cannot open input file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3!
How can I get the correct result? Thank you very much!
When I tried the version 1.0, it did not report any error and output two gff files. It seems good, but it did not generate the mikado.db file. However, the mikado pick would generate a sentence "No database found, creating a mock one!". Would that be OK? Can I use these generated files? Thank you!
When I tried the version 1.0, it did not report any error and output two gff files. It seems good, but it did not generate the mikado.db file. However, the mikado pick would generate a sentence "No database found, creating a mock one!". Would that be OK? Can I use these generated files? Thank you!
Dear @juntaosdu , no, unfortunately those files will not be useful as the Mikado database is not present and contains important information.
Please allow me to try to summarise what happened so far, correct me whenever I am understanding incorrectly:
mikado serialise
needs input files which were not specified at runtime (daijin
should automatically take care of this)mikado2
and run daijin
within it. This failed at the mikado serialise
step, from what I am understanding by the log due to the failure of reading the prodigal
results. Specifically, when parsing a GFF line, there were some where the fourth and fifth field did not have a numeric result.The last bit is puzzling. Could you please check whether the file
Daijin/5-mikado/prodigal/transcripts.fasta.prodigal.gff3
has any line not starting with a #
where the fourth or the fifth column are not integers? That's what triggering the error.
Hi, I have checked the .gff3 file and found that each line not starting with a # where both its fourth and fifthe columns are integers. Anything esle needs me to do?
Dear @juntaosdu , thank you for trying that.
I have a further hypothesis, ie, nothing is wrong with the GFF or Mikado, but this is a bug (fixed upstream) in the fastnumbers
package. Could you please try to:
pip install -U "fastnumbers>=3.0"
within the mikado2
conda environment?
I am also considering removing completely this dependency and go back to use the included functions within Python3, the author of the library himself reported that recent improvements make his project a little bit redundant with Python3.7 and over.
I tried pip install -U "fastnumbers>=3.0" and it is successfully installed. However, when I run daijin mikado -nd configuration.yaml, I received the same error report. What should I do next?
Dear @juntaosdu , many thanks for trying this.
I am currently puzzled by what is actually happening. I think that the best way forward would be for you to send me, privately, your files so I can diagnose the issue more quickly. My email is lucventurini AT gmail DOT com. I would need the configuration file, the mikado_prepared.fasta
and the Prodigal GFF3 for this. With those, I should be able to diagnose what is triggering the bug. I will treat the data confidentially and delete it after diagnosing and fixing the issue.
As an alternative solution or workaround, you can change the following field in the configuration file from false
to true
:
use_transdecoder: false
, delete the prodigal folder, then relaunch daijin. This will use TransDecoder rather than Prodigal, which has a different output format (BED12), and therefore should not trigger this bug at all.
If you would like instead to help diagnose and fix this bug for good, there are a couple of things that can be done:
/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/gfannotation.py
line 92 from:
self.start, self.end = tuple(fastnumbers.fast_int(i) for i in self._fields[3:5])
to
self.start, self.end = tuple(int(i) for i in self._fields[3:5])
to verify whether this solves the bug/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/ENTER/envs/mikado2/lib/python3.7/site-packages/Mikado/parsers/bed12.py
at line 1394 from:
line = GffLine(line)
to
try:
line = GffLine(line)
except KeyboardInterrupt:
raise
except Exception as exc:
self.logger.exception("This line raises an error (%s): %s", exc, line)
continue
Many thanks for your patience.
It is running and seems good. By the way, the file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3 is the final output file?
the file Daijin/5-mikado/pick/permissive/mikado-permissive.loci.gff3 is the final output file?
Yes, those are the final selected transcript. Glad to hear that it is managing to get past the error.
May I ask what option did you choose to pursue (TransDecoder, or the bug fixings that I suggested in the bullet points)? This would help pin down the bug.
Kind regards
bug fixings.
Thank you. If you could check the serialise.log
after the step has finished, to check which lines are creating an issue (if any) I would be grateful.
I will probably remove fastnumbers
from that step. You are not the first to report a bug there.
Thank you.
How to check the file serialise.log?
I meant this file within the working directory:
Daijin/5-mikado/mikado_serialise.log
Apologies for not having been clearer.
It has finished running on my simulated dataset successfully! I used cuffcompare to evaluate its performance, and results showed that it performed really bad. e.g. StringTie correctly assembled 10091 transcripts, while the number of Mikao is only 4411. Is there anything wrong?
I ran Mikado by combining the assemblies from StringTie, StringTie2, Scallop, and Cufflinks.
Dear @juntaosdu , for starters, glad to hear that the run finished successfully!
Now, regarding performance ... I am sorry to hear that it is far lower than you expected. However, I would need some more details on this to help. In order:
plant.yaml
) might not be the most appropriate.On a more general note, Mikado tries to recover transcripts that are protein coding or at most lncRNAs. When we simulated our data for H. sapiens, for example, we excluded many biotypes from the simulation. We also do not consider the original alignments, and we did not define correctness in terms of the congruence with the original data. Specifically, if at a locus the most expressed transcript is a retained intron event that breaks the CDS, with very little expression for a transcript with the correct splicing ... Mikado will try to bring back the correct transcript without the retained intron, even if it has far less read expression support.
I have seen your TransBorrow tool on SourceForge (it looks like a very interesting project!), but it seems to have a very different phylosophy, ie it seems (to my eye) to try to reconstruct the best transcripts given the alignment data. This is very valuable but quite different from what Mikado tries to do (ie trying to recover good gene models from the various options, regardless of which one has the most read support, starting from a preconception of how a gene model should look like).
What should I do if I want to test Mikado on the human RNA-seq data by only using the assemblies from other assemblers like stringtie?
Dear @juntaosdu , then I would do the following, editing the configuration file:
plant.yaml
to mammalian.yaml
.permissive
to stringent
(this will prevent spurious splitting in the absence of BLAST data):
modes = ["permissive"]
to modes=["stringent"]
I would stress though that Mikado is meant to integrate BLAST and junction analysis results together with the ORF data, it will not operate at its best without those.
Thank you very much for the helpful advice!
HI @juntaosdu As luca indicated Mikado probably isn't directly comparable to your TransBorrow tool, It's not clear from the above comments how you are running mikado, the serialise step integrates multiple data sources i.e. ORFs (prodigal), alignments (diamond) and junctions (portcullis), while you can functionally run mikado with just one of these inputs you would not be advised to and the results will be poor. If you have multiple transcript assemblers for human then the fair use of mikado would be to read https://mikado.readthedocs.io/en/latest/Tutorial/index.html use the mammalian.yaml scoring 1: run mikado prepare 2: run prodigal (prodigal -g 1) on the mikado prepare fasta 3: create a mammalian diamond db of a few related species e.g. mouse, chimp etc and align the mikado prepare fasta to that. 4: run portcullis on the bam of the aligned RNA-Seq reads to get the bed junctions file that pass filters 5: run mikado serialise with the above inputs 6: run mikado pick
Mikado is a framework that allows you to select from a pool of transcripts based on user requirements it's not designed to be run with only transcript assemblies and no additional data.
Also @juntaosdu
If you run without junctions NOT ADVISED then you need to update the mammalian.yaml scoring file as we only retain transcripts with some junction support i.e. remove the proportion_verified_introns_inlocus bits from the requirements section
expression: [(combined_cds_fraction.ncrna or combined_cds_fraction.coding) and ((exon_num.multi and (cdna_length.multi or combined_cds_length.multi) and max_intron_length and min_intron_length and proportion_verified_introns_inlocus), or, (exon_num.mono and (combined_cds_length.mono or cdna_length.mono)))]
proportion_verified_introns_inlocus: {operator: gt, value: 0}
thank you
Hi, Could you please show me the detailed running commands of each of the above steps?
You mentioned that "remove the proportion_verified_introns_inlocus bits from the requirements section". How to do this?
Hi @juntaosdu https://mikado.readthedocs.io/en/latest/Tutorial/index.html covers the mikado steps
start with mikado configure i.e. first generate the list.txt file
mikado configure --list list.txt --reference myref.fa --mode stringent --scoring mammalian.yaml --copy-scoring mammalian.yaml configuration.toml
then
mikado prepare -p <number of processes> --json-conf configuration.toml
You mentioned that "remove the proportion_verified_introns_inlocus bits from the requirements section". How to do this?
this relates to the scoring file mammalian.yaml that mikado configure will have copied to your working directory. BUT this only needs to be changed if you DONT generate the junctions
if you run pick without junctions then change the top section of the scoring file to
# Scoring file suitable for any species with intron sizes similar to mammals
requirements:
expression: [(combined_cds_fraction.ncrna or combined_cds_fraction.coding) and ((exon_num.multi and (cdna_length.multi or combined_cds_length.multi) and max_intron_length and min_intron_length), or, (exon_num.mono and (combined_cds_length.mono or cdna_length.mono)))]
parameters:
combined_cds_fraction.ncrna: {operator: eq, value: 0}
combined_cds_fraction.coding: {operator: gt, value: 0.30}
cdna_length.mono: {operator: gt, value: 400}
cdna_length.multi: {operator: ge, value: 300}
combined_cds_length.mono: {operator: gt, value: 225}
combined_cds_length.multi: {operator: gt, value: 150}
exon_num.mono: {operator: eq, value: 1}
exon_num.multi: {operator: gt, value: 1}
max_intron_length: {operator: le, value: 1000000}
min_intron_length: {operator: ge, value: 5}
i.e. remove the requirement for transcripts to have junction support (AGAIN NOT ADVISED)
Dear authors,
Recently I was using your tool Mikado on my data. Below is my running commands:
mikado configure --list list.txt --reference genome.fa configuration.yaml mikado prepare --json-conf configuration.yaml mikado serialise --json-conf configuration.yaml mikado pick --json-conf configuration.yaml
When I finished the mikado serialise, it did not generate the file mikado.db. Then ran mikado pick and it generated the following info: --- Logging error --- Traceback (most recent call last): File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 992, in emit msg = self.format(record) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 838, in format return fmt.format(record) File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 575, in format record.message = record.getMessage() File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib python3.6/logging/init.py", line 338, in getMessage msg = msg % self.args TypeError: not all arguments converted during string formatting Call stack: File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/bin mikado", line 8, in
sys.exit(main())
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib
python3.6/site-packages/Mikado/init.py", line 106, in main
args.func(args)
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib
python3.6/site-packages/Mikado/subprograms/prepare.py", line 165, in prepare
launcher
prepare(args, logger)
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib
python3.6/site-packages/Mikado/preparation/prepare.py", line 428, in prepare
perform_check(sorter(shelf_stacks), shelf_stacks, args, logger)
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib
python3.6/site-packages/Mikado/preparation/prepare.py", line 165, in perform
check
strand_specific=tobj["strandspecific"])
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib
python3.6/site-packages/Mikado/preparation/checking.py", line 78, in create
ranscript
transcript_object.check_strand()
File "/storage/juntaosdu/yuting/TransBorrow-Revision/Assemblers/local3/lib
python3.6/site-packages/Mikado/transcripts/transcriptchecker.py", line 216,
n check_strand
canonical_counter["-"])
Message: 'Transcript %s has been assigned to the wrong strand, tagging it bu
leaving it on this strand.'
Arguments: ('sca_bundle.18040.0.4', 0, 1)
......
Could you please help handle this?
Thank you very much!