bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

Problem with clusterseq #16

Closed mrozwandowicz closed 3 years ago

mrozwandowicz commented 3 years ago

Hi,

I am very excited about using mgefinder, but so far I cannot make it work. I successfully run the script with a test dataset. However, multiple trials with different isolates and different reference genomes gave me the same error.

Error in job clusterseq while creating output file workdir/03.results/R27/01.clusterseq.R27.tsv. RuleException: CalledProcessError in line 259 of /home/rozwandm/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile: Command 'mgefinder clusterseq -minsize 70 -maxsize 200000 --threads 1 --memory 16000 workdir/01.mgefinder/R27/R27.all_inferseq_database.txt -o workdir/03.results/R27/01.clusterseq.R27.tsv' returned non-zero exit status 1. File "/home/rozwandm/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile", line 259, in __rule_clusterseq File "/home/rozwandm/.conda/envs/mgefinder/lib/python3.6/concurrent/futures/thread.py", line 56, in run Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

Thank you in advance for your help, Marta

durrantmm commented 3 years ago

Hello!

Thanks for asking this.

What error do you get when you run:

mgefinder clusterseq -minsize 70 -maxsize 200000 --threads 1 --memory 16000 workdir/01.mgefinder/R27/R27.all_inferseq_database.txt -o workdir/03.results/R27/01.clusterseq.R27.tsv

when the mgefinder conda environment is activated?

mrozwandowicz commented 3 years ago

I got this error message:

IndexError: list index out of range

durrantmm commented 3 years ago

This is a strange one, how many samples are you analyzing?

mrozwandowicz commented 3 years ago

I always tried one sample at the time. Because I had problems and I thought the reference genome might be the problem, I tried multiple references, but always one at the time.

durrantmm commented 3 years ago

Great, could you please send me one of the samples that failed so I can test it? Thanks.

mrozwandowicz commented 3 years ago

I send you all of the files on this email address: mdurrant@stanford.edu

durrantmm commented 3 years ago

Thanks for sending over your files.

I was able to recreate the error you found. The issue was with how you named the files in your working directory. Please refer to the tutorial to see exactly how to set up your working directory.

You had it set up like this:

workdir/
├── 00.assembly/
│   └── 15S04545-4.fasta.fna
├── 00.bam/
│   ├── 15S04545-4.bam
│   └── 15S04545-4.bam.bai
└── 00.genome/
    └── R27.fna

I fixed it so that it looked like this:

workdir/
├── 00.assembly/
│   └── 15S04545-4.fna
├── 00.bam/
│   ├── 15S04545-4.R27.bam
│   └── 15S04545-4.R27.bam.bai
└── 00.genome/
    └── R27.fna

And that fixed it.

I hope that helps! Make sure you include all of your isolates in a single working directory.