gbouras13 / plassembler

Program to quickly and accurately assemble plasmids in hybrid and long-only sequenced bacterial isolates
MIT License
49 stars 3 forks source link

Pipeline failed No plasmid_long.fastq output #43

Closed nbat64 closed 5 months ago

nbat64 commented 7 months ago

Describe the bug Hello, Thank you for your pipeline. I was able to run it without problem for some samples on a previous version (august). I have upgraded to the version 1.5.0 add run with the following option:

plassembler run \
--database $DATABASE \
--longreads ${SAMPLE}.fastq \
--short_one ${SAMPLE}_R1.fastq \
--short_two ${SAMPLE}_R2.fastq \
--chromosome 4000000 \
--threads 48 \
--prefix $SAMPLE \
--force \
--min_quality $QUALITY \
--min_length $LENGTH \
--keep_chromosome \
--keep_fastqs \
--outdir ${SAMPLE}_output/

For several samples, the pipeline failed apparently after Flye with the following error message:

2023-12-04 16:12:09.304 | INFO     | plassembler:run:701 - Extracting Chromosome.
Traceback (most recent call last):
  File "/soft/2019013/Conda_env/plassembler_env/lib/python3.9/shutil.py", line 825, in move
    os.rename(src, real_dst)
FileNotFoundError: [Errno 2] No such file or directory: 'IGLE11_output/plasmid_long.fastq' -> 'IGLE11_output/plasmid_fastqs/plasmids_long.fastq'

No plasmid_long.fastq have been produced. Do you know what could cause this error for some samples? Could it be due to the quality of the Nanopore reads? Type of basecalling used (super mode min qscore of 7).

Thanks for the help and advice.

Regards,

Nicolas

gbouras13 commented 7 months ago

Hi @nbat64 ,

I have run a few tests and have been unable to replicate this error on my laptop.

I highly doubt it is due to the quality or base calling.

Are you running this on a cluster/HPC? It might have to do with file system latency or similar issues with your system, the error suggests that you are running into a file does not exist error. The file system might be too slow to make the output directory 'plasmid_fastqs' - so that might cause it.

George

nbat64 commented 7 months ago

Hi George,

Thanks for your message. Yes I am running it on a cluster (array job). Which is strange is that some of my samples go through the whole pipeline without error:

2023-12-04 16:06:35.935 | INFO     | plassembler:run:736 - Chromosome Identified. Plassembler will now use long and short reads to assemble plasmids accurately.
2023-12-04 16:06:35.936 | INFO     | plassembler:run:738 - Mapping long reads.

Is it a problem with the mapping of ONT reads, or before with the identification of chromosomes (plassembler:736)? I will try with another value for --c and without --keep_fastqs

Thanks.

gbouras13 commented 7 months ago

Hi @nbat64 ,

With the first/original issue, if you re-run on the same samples, do you consistently get the issue? If not then I'd put it down to HPC problems. I've come across these filesystem issues running programs on HPC before (albeit not plassembler).

Trying a lower -c value and not keeping fastqs won't hurt if you don't need them.

George

nbat64 commented 7 months ago

Hi George, It is indeed a newly built Cluster and I had some weird behaviour with others scripts too... I will make more test on the HPC and outside. Thanks, Nicolas

gbouras13 commented 5 months ago

Closing as due to time