bhattlab / MGEfinder

A toolbox for identifying mobile genetic element (MGE) insertions from short-read sequencing data of bacterial isolates.
MIT License
105 stars 16 forks source link

No termini found in the input file... #29

Closed SamuelGreenrod closed 1 year ago

SamuelGreenrod commented 3 years ago

I'm running MGEfinder on my own data and have used the pipeline described in the readme document. This is using a complete assembly called "Ancestor.fna", a sample assembly assembled using Unicycler labelled "48con5.fna", and the bam and ba.bai files made using bwa mem followed by the mgefinder formatbam. When I run it I get the error message:

Parsing inferseq files Combining the inferseq files... Loading file 1/3: workdir/01.mgefinder/Ancestor/48con5/03.inferseq_assembly.48con5.Ancestor.tsv Loading file 2/3: workdir/01.mgefinder/Ancestor/48con5/03.inferseq_reference.48con5.Ancestor.tsv Loading file 3/3: workdir/01.mgefinder/Ancestor/48con5/03.inferseq_overlap.48con5.Ancestor.tsv Deleting old database directory... No termini found in the input file... Waiting at most 5 seconds for missing files. Error in job make_database while creating output files workdir/02.database/Ancestor/Ancestor.database.fna, workdir/02.database/Ancestor/Ancestor.database.fna.1.bt2. MissingOutputException in line 192 of /users/steg500/.conda/envs/mgefinder/lib/python3.6/site-packages/mgefinder/workflow/denovo.original.Snakefile: Missing files after 5 seconds: workdir/02.database/Ancestor/Ancestor.database.fna workdir/02.database/Ancestor/Ancestor.database.fna.1.bt2 This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Will exit after finishing currently running jobs. Exiting because a job execution failed. Look above for error message

I've tried increasing the latency wait time but it doesn't recognise the --latency-wait command when I run it with mgefinder workflow denovo. Do you have any ideas how I could fix this? Thank you!

SamuelGreenrod commented 3 years ago

UPDATE: MGEfinder still isn't running on my own data but does run one the example data. This suggests there is a problem with my files, although I'm not quite sure what. I've tried to replicate the example data by: 1) using assemblies constructed using Spades with the command mentioned in the instructions; 2) using a bam file made following the instructions with bwa mem and formatbam; 3) using a single contig genome file (in my case one downloaded from NCBI).

Please could you check what could cause the error message so I can change my input files accordingly? Thank you.

durrantmm commented 2 years ago

I apologize for losing track of this issue. Were you able to resolve this by chance?

NESmalley commented 2 years ago

@durrantmm @SamuelGreenrod I am experiencing this problem as well, having followed all the preparation steps in the MGEfinder tutorial to prepare my own data. Just as SamuelGreenrod, I am able to run the pipeline on the tutorial dataset without issue. Current version of snakemake: 3.13.3 Thanks for any help!

durrantmm commented 2 years ago

How many different genomes did you use as input? The most likely cause is that it couldn't identify any potential insertion termini.

NESmalley commented 2 years ago

Only one genome, from evolved strains; I was test running with these files before we look at sequencing from a strain with a known phage. Is the "No termini found in the input file..." error what comes up if the fastas fed in have no novel insertions? Thank you for being at your computer just now :)

durrantmm commented 2 years ago

Yes, that's correct, it looks like it couldn't find any novel insertions.

NESmalley commented 2 years ago

Perfect! Thank you so much.