BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.
https://biojulia.dev
MIT License
61 stars 20 forks source link

Subsetting a FASTQ record #34

Closed tp2750 closed 3 years ago

tp2750 commented 4 years ago

Expected Behavior

It would be great to be able to subset a fastq record using normal string subsetting syntax like this:

s1[3:6]

Current Behavior

We can subset sequnce and quality separately like this:

julia> s1 = first(open(FASTQ.Reader, "/tmp/s1.fq"))
julia> sequence(s1, 3:6)
4nt DNA Sequence:
AGTT
julia> quality(s1, 33, 3:6)
4-element Array{UInt8,1}:
 0x02
 0x24
 0x24
 0x24

But using normal string subsetting syntax does not work:

julia> s1[3:6]
ERROR: MethodError: no method matching getindex(::FASTX.FASTQ.Record, ::UnitRange{Int64})

It would be useful if this returned a new FASTQ-record with the sequnce and quality of the range.

Possible Solution / Implementation

I'll like to try and add a getindex method for this. Will you be willing to accept a pull request?

(@v1.5) pkg> st FASTX
Status `/tmp/d1/Project.toml`
  [c2308a5c] FASTX v1.1.3
jakobnissen commented 4 years ago

I think it makes sense. Feel free to make a PR.

tp2750 commented 4 years ago

Please review the pull request.

If you are happy with the implementation, I will add a section to the documentation as well.