BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.
https://biojulia.dev
MIT License
61 stars 20 forks source link

Reading fastq from TranscodingStream #8

Closed blaiseli closed 5 years ago

blaiseli commented 5 years ago

I would like to be able to parse gzipped fastq files through a TranscodingStream, but for a start, I'm trying on a non-gzipped one, using the NoopStream.

Based on examples, I came up with the following approach:

julia> using FASTX

julia> using TranscodingStreams

julia> fq_filename = "/tmp/test_qaf_demux_proc/Undetermined.fq"
"/tmp/test_qaf_demux_proc/Undetermined.fq"

julia> open(NoopStream, fq_filename) do stream
           reader = FASTQ.Reader(stream)
       end
ERROR: UndefVarError: stream not defined
Stacktrace:
 [1] #Reader#5(::Nothing, ::Type{FASTX.FASTQ.Reader}, ::TranscodingStream{Noop,IOStream}) at /home/bli/.julia/packages/FASTX/PilAI/src/fastq/reader.jl:27
 [2] FASTX.FASTQ.Reader(::TranscodingStream{Noop,IOStream}) at /home/bli/.julia/packages/FASTX/PilAI/src/fastq/reader.jl:19
 [3] (::getfield(Main, Symbol("##9#10")))(::TranscodingStream{Noop,IOStream}) at ./REPL[11]:2
 [4] open(::getfield(Main, Symbol("##9#10")), ::Type{TranscodingStream{Noop,S} where S<:IO}, ::String) at /home/bli/.julia/packages/TranscodingStreams/MsN8d/src/stream.jl:161
 [5] top-level scope at REPL[11]:1

The following works:

julia> open(NoopStream, fq_filename) do stream
           for line in eachline(stream)
               println(line)
           end
       end

So I don't get why I have ERROR: UndefVarError: stream not defined if instead of reading from the stream, I just pass it to a FASTQ.Reader.

I also tried as follows:

julia> stream = open(NoopStream, fq_filename)
       reader = FASTQ.Reader(stream)
ERROR: MethodError: no method matching open(::Type{TranscodingStream{Noop,S} where S<:IO}, ::String)
Closest candidates are:
  open(::AbstractString, ::AbstractString) at iostream.jl:345
  open(::Function, ::Any...; kwargs...) at iostream.jl:373
  open(::Base.AbstractCmd, ::AbstractString) at process.jl:626
  ...
Stacktrace:
 [1] top-level scope at REPL[12]:1

I'm beginning in Julia, so I'm not really sure I interpret the error messages and source code correctly, but isn't FASTQ.Reader supposed to accept a TranscodingStream ?

function Reader(input::IO; fill_ambiguous = nothing)
    if fill_ambiguous === nothing
        seq_transform = nothing
    else
        seq_transform = generate_fill_ambiguous(fill_ambiguous)
    end
    if !(input isa TranscodingStream)
        stream = TranscodingStreams.NoopStream(input)
    end
    return Reader(State(stream, 1, 1, false), seq_transform)
end

And also, isn't my first error due to stream not being defined in the above code when input isa TranscodingStream is true ?

Your Environment

blaiseli commented 5 years ago

The pull request seems to fix the error obtained using the first approach (through do ... end).

The second approach failed due to me not using NoopStream correctly.

The following works:

julia> stream = NoopStream(open(fq_filename))
TranscodingStream{Noop,IOStream}(<mode=idle>)

julia> reader = FASTQ.Reader(stream)
FASTX.FASTQ.Reader{TranscodingStream{Noop,IOStream}}(BioGenerics.Automa.State{TranscodingStream{Noop,IOStream}}(TranscodingStream{Noop,IOStream}(<mode=idle>), 1, 1, false), nothing)