BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.
https://biojulia.dev
MIT License
61 stars 20 forks source link

read! doesn't work in v2.0.0? #95

Closed banhbio closed 1 year ago

banhbio commented 1 year ago

Thank you for building a nice package.

When I migrated from FASTX.jl v1.3.0 to v2.0.0, an error occurred due to the behavior of read! I couldn't find any docs or tests on read! in v2.0.0, so I'm confused about whether this is a deprecated or a bug.

Expected Behavior

In v1.3.0

julia> open(FASTQ.Reader, "./test/test.fastq") do x
           r = FASTQ.Record()
           while !eof(x)
               read!(x,r)
               @show r
           end
       end

returns

r = FASTX.FASTQ.Record:
   identifier: ABC
  description: DEF
     sequence: TAGC
      quality: UInt8[0x29, 0x29, 0x29, 0x29]

Current Behavior

But, in v2.0.0, the code above returns nothing.

r = FASTQ.Record:
  description: ""
     sequence: ""
      quality: ""

Steps to Reproduce (for bugs)

I got the test file from here

Context

In ./src/fastq/reader.jl, read! seems to do nothing for rec.

function Base.read!(rdr::Reader, rec::Record)
    (cs, f) = _read!(rdr, rdr.record)
    if !f
        cs == 0 && throw(EOFError())
        throw(ArgumentError("malformed FASTQ file"))
    end    
    return rec
end

Is it expected behavior or a bug? best.

banhbio commented 1 year ago

Hi. I simply looked at ./src/fasta/reader.jl and the code was different. read! for fasta completely works. I think read! for fastq has a bug.

function Base.read!(rdr::Reader, rec::Record)
    (cs, f) = _read!(rdr, rec)
    if !f
        cs == 0 && throw(EOFError())
        throw(ArgumentError("malformed FASTA file"))
    end    
    return rec
end
jakobnissen commented 1 year ago

@banhbio You are completely right, this is a bug. I've made fix, added tests so it doesn't regress in the future, and will release version 2.0.1 with the fix in a few hours.

For what it's worth, in version 2, it's preferred to use a regular for loop to iterate over readers:

for record in reader
    [do something]
end

Instead of the old read! trick:

record = FASTQ.Record()
while !eof(reader)
    read!(reader, record)
    [do something]
end

In 99% of cases, the slight performance boost of the latter will not matter. When it does matter, you can initialize the reader with the keyword copy=false, as in open(FASTQ.Reader, "file.fastq"; copy=false), then loop. This is just as fast as the old read! idiom, but more pleasant.

banhbio commented 1 year ago

@jakobnissen Thanks for your response and great advice!