bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
112 stars 16 forks source link

Error in job make_database while creating output files #36

Closed gvic628 closed 2 years ago

gvic628 commented 2 years ago

Hello,

I'm writing about some trouble that I have had running MGEfinder with my own data. I was able to complete the step-by-step tutorial without any issues-- The "mgefinder workflow denovo" command ran just fine and the correct output files were generated.

For my own data, I created a directory called "workdir", which included three directories:

To run mgefinder, I used the following command: mgefinder workflow denovo -t 10 workdir/

The program terminated with only 53% of the analysis completed. I've included the error file as an attachment. The issues seems to be related to the following portion of the error file:

rule make_database: input: workdir/01.mgefinder/flye_final_polished/flye_final_polished.all_inferseq.txt output: workdir/02.database/flye_final_polished/flye_final_polished.database.fna, workdir/02.database/flye_final_polished/flye_final_polished.database.fna.1.bt2 jobid: 12 benchmark: workdir/02.database/flye_final_polished/flye_final_polished.database.benchmark.txt wildcards: genome=flye_final_polished threads: 10

Waiting at most 5 seconds for missing files. Error in job make_database while creating output files workdir/02.database/flye_final_polished/flye_final_polished.database.fna, workdir/02.database/flye_final_polished/flye_final_polished.database.fna.1.bt2. MissingOutputException in line 192 of /scratch2/software/anaconda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile: Missing files after 5 seconds: workdir/02.database/flye_final_polished/flye_final_polished.database.fna workdir/02.database/flye_final_polished/flye_final_polished.database.fna.1.bt2 This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

Any insight and assistance would be appreciated! Thank you!

mgefinder.sh.e180713.txt mgefinder.sh.o180713.txt

durrantmm commented 2 years ago

Thanks for reaching out for help! Please show me the contents of the directory workdir/02.database/flye_final_polished/ with a command such as

ls -1 workdir/02.database/flye_final_polished/

If there are any log files in that directory, please attach them here as well.

Thank you!

gvic628 commented 2 years ago

Hello!

Thanks so much for your help! I've attached the contents of "workdir/02.database/flye_final_polished/" here. There was only the single .txt file. No log files in that directory.

Log directories were created in 00.assembly and 00.genome.

workdir/01.mgefinder/flye_final_polished/ contains a "cab" directory with a number of .tsv files and a log directory, also with a number of different files. Let me know if any of those would be helpful. flye_final_polished.database.benchmark.txt

krishna1925 commented 2 years ago

Hi! I tried to use mgefinder to search for possible mobile genetic elements using MGEfinder. However, when I run the command, it stopped with the message :

" COMMAND: snakemake -s /home/kpa26/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/database.sensitive.Snakefile --config wd=779assemblies/ memory=16000 --cores 1 --configfile /home/kpa26/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/database.sensitive.config.yml Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1

rule all: jobid: 0

Finished job 0. 1 of 1 steps (100%) done"

I was wondering if you could help me to fix this issue.

Thank you. Krishna

wanqiangdehuoguo commented 2 years ago

Hello, I have the same problem. I run: mgefinder workflow denovo workdir Encountered error:

#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
Get help documentation with --help.
Get version with --version.
#### PARAMETERS ####
command: workflow
workdir: workdir
cores: 1
memory: 16000
unlock: False
rerun_incomplete: False
keep_going: False
sensitive: False
####################
COMMAND: snakemake -s /home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile --config wd=workdir memory=16000 --cores 1 --configfile /home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.config.yml 
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   all
    1   clusterseq
    1   genotype
    1   inferseq_database
    1   make_database
    1   make_inferseq_database_file_path_list
    1   make_pair_file_path_list
    1   makefasta
    1   summarize
    9

rule make_database:
    input: workdir/01.mgefinder/GXDK6_genome/GXDK6_genome.all_inferseq.txt
    output: workdir/02.database/GXDK6_genome/GXDK6_genome.database.fna, workdir/02.database/GXDK6_genome/GXDK6_genome.database.fna.1.bt2
    jobid: 14
    benchmark: workdir/02.database/GXDK6_genome/GXDK6_genome.database.benchmark.txt
    wildcards: genome=GXDK6_genome

#### CHECKING DEPENDENCIES ####
Current version of snakemake: 3.13.3
Expected version of snakemake: 3.13.3
Current version of einverted: EMBOSS:6.6.0.0
Expected version of einverted: EMBOSS:6.6.0.0
Current version of bowtie2: 2.3.5
Expected version of bowtie2: 2.3.5
Current version of samtools: 1.9
Expected version of samtools: 1.9
Current version of cd-hit: 4.8.1
Expected version of cd-hit: 4.8.1
###############################
Get help documentation with --help.
Get version with --version.
#### PARAMETERS ####
command: makedatabase
inferseqfiles: ('workdir/01.mgefinder/GXDK6_genome/GXDK6_genome.all_inferseq.txt',)
minimum_size: 30
maximum_size: 200000
threads: 1
memory: 16000
force: True
output_dir: workdir/02.database/GXDK6_genome
prefix: GXDK6_genome.database
####################
Parsing inferseq files
Combining the inferseq files...
Loading file 1/3: workdir/01.mgefinder/GXDK6_genome/GXDK6_contigs/03.inferseq_assembly.GXDK6_contigs.GXDK6_genome.tsv
Loading file 2/3: workdir/01.mgefinder/GXDK6_genome/GXDK6_contigs/03.inferseq_reference.GXDK6_contigs.GXDK6_genome.tsv
Loading file 3/3: workdir/01.mgefinder/GXDK6_genome/GXDK6_contigs/03.inferseq_overlap.GXDK6_contigs.GXDK6_genome.tsv
Deleting old database directory...
No termini found in the input file...
Waiting at most 5 seconds for missing files.
Error in job make_database while creating output files workdir/02.database/GXDK6_genome/GXDK6_genome.database.fna, workdir/02.database/GXDK6_genome/GXDK6_genome.database.fna.1.bt2.
MissingOutputException in line 192 of /home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile:
Missing files after 5 seconds:
workdir/02.database/GXDK6_genome/GXDK6_genome.database.fna
workdir/02.database/GXDK6_genome/GXDK6_genome.database.fna.1.bt2
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Traceback (most recent call last):
  File "/home/bio/.conda/envs/mgefinder/bin/mgefinder", line 8, in <module>
    sys.exit(cli())
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/main.py", line 78, in denovo
    _workflow(workdir, snakefile, configfile, cores, memory, unlock, rerun_incomplete, keep_going)
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow.py", line 25, in _workflow
    shell(cmd)
  File "/home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/snakemake/shell.py", line 88, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'snakemake -s /home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile --config wd=workdir memory=16000 --cores 1 --configfile /home/bio/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.config.yml ' returned non-zero exit status 1.

Then i run:

cd workdir
tree 0*/

got:

00.assembly
├── GXDK6_contigs.fna
├── GXDK6_contigs.fna.1.bt2
├── GXDK6_contigs.fna.2.bt2
├── GXDK6_contigs.fna.3.bt2
├── GXDK6_contigs.fna.4.bt2
├── GXDK6_contigs.fna.rev.1.bt2
├── GXDK6_contigs.fna.rev.2.bt2
└── log
    ├── GXDK6_contigs.index_assembly.benchmark.txt
    ├── GXDK6_contigs.index_assembly.log
    └── GXDK6_contigs.index_assembly.log.err
00.bam
├── GXDK6_contigs.GXDK6_genome.bam
└── GXDK6_contigs.GXDK6_genome.bam.bai
00.genome
├── GXDK6_genome.fna
├── GXDK6_genome.fna.1.bt2
├── GXDK6_genome.fna.2.bt2
├── GXDK6_genome.fna.3.bt2
├── GXDK6_genome.fna.4.bt2
├── GXDK6_genome.fna.rev.1.bt2
├── GXDK6_genome.fna.rev.2.bt2
└── log
    ├── GXDK6_genome.index_bowtie2.benchmark.txt
    ├── GXDK6_genome.index_bowtie2.log
    └── GXDK6_genome.index_bowtie2.log.err
01.mgefinder
└── GXDK6_genome
    ├── GXDK6_contigs
    │   ├── 01.find.GXDK6_contigs.GXDK6_genome.tsv
    │   ├── 02.pair.GXDK6_contigs.GXDK6_genome.tsv
    │   ├── 03.inferseq_assembly.GXDK6_contigs.GXDK6_genome.tsv
    │   ├── 03.inferseq_overlap.GXDK6_contigs.GXDK6_genome.tsv
    │   ├── 03.inferseq_reference.GXDK6_contigs.GXDK6_genome.tsv
    │   └── log
    │       ├── GXDK6_contigs.GXDK6_genome.benchmark.txt
    │       ├── GXDK6_contigs.GXDK6_genome.find.benchmark.txt
    │       ├── GXDK6_contigs.GXDK6_genome.find.log
    │       ├── GXDK6_contigs.GXDK6_genome.find.log.err
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_assembly.benchmark.txt
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_assembly.log
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_assembly.log.err
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_overlap.log
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_overlap.log.err
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_reference.benchmark.txt
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_reference.log
    │       ├── GXDK6_contigs.GXDK6_genome.inferseq_reference.log.err
    │       ├── GXDK6_contigs.GXDK6_genome.pair.benchmark.txt
    │       ├── GXDK6_contigs.GXDK6_genome.pair.log
    │       └── GXDK6_contigs.GXDK6_genome.pair.log.err
    └── GXDK6_genome.all_inferseq.txt
02.database
└── GXDK6_genome
    └── GXDK6_genome.database.benchmark.txt

6 directories, 44 files

I wonder how to fix it

durrantmm commented 2 years ago

Hello!

Thanks so much for your help! I've attached the contents of "workdir/02.database/flye_final_polished/" here. There was only the single .txt file. No log files in that directory.

Log directories were created in 00.assembly and 00.genome.

workdir/01.mgefinder/flye_final_polished/ contains a "cab" directory with a number of .tsv files and a log directory, also with a number of different files. Let me know if any of those would be helpful. flye_final_polished.database.benchmark.txt

Ok, sorry for the delayed response. It looks like there were no putative insertion sites identified, so the analysis failed. Try using more samples that are more diverged from the reference genome.

durrantmm commented 2 years ago

@wanqiangdehuoguo , same problem for you. You only analyzed one genome and no putative insertion sites were identified. I'll try fix this bug so that MGEfinder will properly report that it could find no insertion sites.

camillevleal commented 2 years ago

Hi! I tried to use mgefinder to search for possible mobile genetic elements using MGEfinder. However, when I run the command, it stopped with the message :

" COMMAND: snakemake -s /home/kpa26/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/database.sensitive.Snakefile --config wd=779assemblies/ memory=16000 --cores 1 --configfile /home/kpa26/miniconda3/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/database.sensitive.config.yml Provided cores: 1 Rules claiming more threads will be scaled down. Job counts: count jobs 1 all 1

rule all: jobid: 0

Finished job 0. 1 of 1 steps (100%) done"

I was wondering if you could help me to fix this issue.

Thank you. Krishna

Hi krishna1925 I'm having the same result. Did you manage to find out what happened? What could be done to fix it?