epi2me-labs / fastcat

Simple utility to concatenate .fastq(.gz) files whilst creating a summary of the sequences.
https://labs.epi2me.io/
Other
33 stars 5 forks source link

fastcat gets stuck in loop and concatenates endlessly #7

Open lucast122 opened 3 weeks ago

lucast122 commented 3 weeks ago

Dear devs, I was looking for more efficient way to concatenate fastq files and was happy to find this efficient implementation. Not sure if this is a bug or just wrong usage by me, but I decided to report this anyway in case it may help.

I just ran fastcat -f fastcat_summary.txt -q 10 -r read_fastcat_summary.txt -s test . > concat_reads.fastq

inside a directory containing only fastq.gz files and it seems to work fine at first, but after concatenating all files it never stopped and I noticed that it started to concatenate the output as well

Like this Processing ./FAV63646_pass_cfbcf7b8_a2aa673f_2130.fastq.gz Processing ./FAV63646_pass_cfbcf7b8_a2aa673f_2780.fastq.gz Processing ./concat_reads.fastq

I stopped it when the file size reached 100GB (combined reads are only 2GB in size though)

When I output to a different directory it worked without any issues: fastcat -f fastcat_summary.txt -q 10 -r read_fastcat_summary.txt -s test . > output/concat_reads.fastq

I'm running Ubuntu 20.04.6 and installed it today inside a fresh environment using mamba.

Maybe you need to reflect this in the usage info and README or patch it, since right now one might assume that this should work as stated in the README. Hope this is useful and thanks for providing this tool, I like the idea of combining concatenation with making the read stats for efficiency!

EDIT: The issue also happens when outputting to stdout instead of writing to a file

cjw85 commented 3 weeks ago

Hi @lucast122,

Thank you for reporting this. When you run:

fastcat ... . > output.fastq

the shell is creating the output.fastq file before fastcat starts, and so fastq sees it as a file in its input directory .. I believe its possible for fastcat to introspec where its output has been directed, so it may be possible to explicitely exclude the output file from the files which fastcat processes, we will take a look at implementing this.