Open lucast122 opened 3 weeks ago
Hi @lucast122,
Thank you for reporting this. When you run:
fastcat ... . > output.fastq
the shell is creating the output.fastq file before fastcat starts, and so fastq sees it as a file in its input directory .
. I believe its possible for fastcat to introspec where its output has been directed, so it may be possible to explicitely exclude the output file from the files which fastcat processes, we will take a look at implementing this.
Dear devs, I was looking for more efficient way to concatenate fastq files and was happy to find this efficient implementation. Not sure if this is a bug or just wrong usage by me, but I decided to report this anyway in case it may help.
I just ran fastcat -f fastcat_summary.txt -q 10 -r read_fastcat_summary.txt -s test . > concat_reads.fastq
inside a directory containing only fastq.gz files and it seems to work fine at first, but after concatenating all files it never stopped and I noticed that it started to concatenate the output as well
Like this Processing ./FAV63646_pass_cfbcf7b8_a2aa673f_2130.fastq.gz Processing ./FAV63646_pass_cfbcf7b8_a2aa673f_2780.fastq.gz Processing ./concat_reads.fastq
I stopped it when the file size reached 100GB (combined reads are only 2GB in size though)
When I output to a different directory it worked without any issues: fastcat -f fastcat_summary.txt -q 10 -r read_fastcat_summary.txt -s test . > output/concat_reads.fastq
I'm running Ubuntu 20.04.6 and installed it today inside a fresh environment using mamba.
Maybe you need to reflect this in the usage info and README or patch it, since right now one might assume that this should work as stated in the README. Hope this is useful and thanks for providing this tool, I like the idea of combining concatenation with making the read stats for efficiency!
EDIT: The issue also happens when outputting to stdout instead of writing to a file