OpenGene / fastp

An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...)
MIT License
1.88k stars 332 forks source link

Unable to read from named pipes #517

Open wasade opened 1 year ago

wasade commented 1 year ago

I'm attempting to use named pipes with fastp but have been so far unsuccessful. The specific usecase is I would like to perform some operations on R1 and R2 data, prior to execution of fastp, and avoid going to disk in between.

The example below is minimal. From the strace output, it appears that fastp is successful reading at least the first read from r1 and r2. However, the program then halts. On ctrl-c, the zcat processes have terminated suggesting the pipes have been consumed.

It may be that fastp is not detecting EOF in this case as expected

$ mkfifo r1
$ mkfifo r2
$ zcat R1.trimmed.fastq.gz | head -n 400 > r1 &
$ zcat R2.trimmed.fastq.gz | head -n 400 > r2 &
$ strace fastp -i r1 -I r2 --html /dev/null --json /dev/null --stdout
...intentionally removed not relevant output...

open("r1", O_RDONLY)                    = 3
fstat(3, {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ef862cf3000
read(3, "@sequence_id"..., 8388608) = 8192
read(3, "TCACGATCTTTTTTTT\n+\nB@@FFFFFHFHFG"..., 8380416) = 6058
read(3, "", 8372224)                    = 0
close(3)                                = 0
munmap(0x7ef862cf3000, 8192)            = 0
munmap(0x7ef86186d000, 8392704)         = 0
munmap(0x7ef86146c000, 4198400)         = 0
brk(NULL)                               = 0x55d5810a9000
brk(0x55d5818b3000)                     = 0x55d5818b3000
brk(NULL)                               = 0x55d5818b3000
brk(0x55d581cb3000)                     = 0x55d581cb3000
open("r2", O_RDONLY)                    = 3
fstat(3, {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ef862cf3000
read(3, "@sequence_id"..., 8388608) = 8192
read(3, "ATCTCGGTGGTAAGCG\n+\nBB@FFFDFHFHAH"..., 8380416) = 6058
read(3, "", 8372224)                    = 0
close(3)                                = 0
munmap(0x7ef862cf3000, 8192)            = 0
stat("r1", {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
stat("r1", {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
stat("r2", {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
stat("r2", {st_mode=S_IFIFO|0644, st_size=0, ...}) = 0
write(2, "Streaming uncompressed ", 23Streaming uncompressed ) = 23
write(2, "interleaved", 11interleaved)             = 11
write(2, " reads to STDOUT...", 19 reads to STDOUT...)     = 19
write(2, "\n", 1
)                       = 1
write(2, "Enable interleaved output mode f"..., 52Enable interleaved output mode for paired-end input.) = 52
write(2, "\n", 1
)                       = 1
write(2, "\n", 1
)                       = 1
open("r1", O_RDONLY^Cstrace: Process 42138 detached
 <detached ...>
[1]-  Done                    zcat R1.trimmed.fastq.gz | head -n 400 > r1
[2]+  Done                    zcat R2.trimmed.fastq.gz | head -n 400 > r2
niemasd commented 1 year ago

For additional context, I tried it using named pipes as well (but using the <(...) syntax), and I had the same results (fastp runs, but the output is empty):

fastp -i <(zcat R1.trimmed.fastq.gz | head -n 400) -I <(zcat R2.trimmed.fastq.gz | head -n 400) --html /dev/null --json /dev/null --stdout

However, when I create temporary files on disk instead of using named pipes, fastp works correctly, and I get the expected (non-empty) output:

zcat R1.trimmed.fastq.gz | head -n 400 > R1_sub.fq
zcat R2.trimmed.fastq.gz | head -n 400 > R2_sub.fq
fastp -i R1_sub.fq -I R2_sub.fq --html /dev/null --json /dev/null --stdout

So I agree with @wasade that perhaps fastp is somehow breaking when used with named pipes rather than actual files on disk?