BioJulia / Kmers.jl

In development: Kmer types and methods for julia
MIT License
21 stars 7 forks source link

Kmers.jl is unable to comply with `BioSequences.has_interface` as currently implemented #33

Closed cjprybol closed 1 year ago

cjprybol commented 1 year ago

The problematic line is https://github.com/BioJulia/BioSequences.jl/blob/8d015fd44ce1e27fb8dbf5ec9b0f37dcece3d0cc/src/biosequence/biosequence.jl#L62

julia> syms = rand(ACGT, 31)
31-element Vector{DNA}:
 DNA_C
 DNA_A
 DNA_A
 DNA_C
 DNA_C
 DNA_A
 DNA_G
 DNA_T
 DNA_G
 DNA_C
 DNA_T
 DNA_G
 DNA_G
 DNA_A
 DNA_A
 DNA_C
 DNA_T
 DNA_C
 DNA_T
 DNA_A
 DNA_A
 DNA_G
 DNA_A
 DNA_T
 DNA_A
 DNA_G
 DNA_C
 DNA_G
 DNA_G
 DNA_T
 DNA_G

# these all work for LongSequences
julia> LongDNA{2}((i for i in syms))
31nt DNA Sequence:
CAACCAGTGCTGGAACTCTAAGATAGCGGTG

julia> LongDNA{2}(syms)
31nt DNA Sequence:
CAACCAGTGCTGGAACTCTAAGATAGCGGTG

julia> LongDNA{2}((syms))
31nt DNA Sequence:
CAACCAGTGCTGGAACTCTAAGATAGCGGTG

# only 2 of the 3 work for Kmers
julia> Kmers.kmertype(Kmer{DNAAlphabet{2},31})(syms)
DNA 31-mer:
CAACCAGTGCTGGAACTCTAAGATAGCGGTG

julia> Kmers.kmertype(Kmer{DNAAlphabet{2},31})((syms))
DNA 31-mer:
CAACCAGTGCTGGAACTCTAAGATAGCGGTG

julia> Kmers.kmertype(Kmer{DNAAlphabet{2},31})((i for i in syms))
ERROR: MethodError: no method matching getindex(::Base.Generator{Vector{DNA}, typeof(identity)}, ::Int64)
Stacktrace:
 [1] Kmer{DNAAlphabet{2}, 31, 1}(itr::Base.Generator{Vector{DNA}, typeof(identity)})
   @ Kmers ~/workspace/Kmers.jl/src/kmer.jl:159
 [2] top-level scope
   @ REPL[15]:1

Because we hit the method error, we go to https://github.com/BioJulia/BioSequences.jl/blob/8d015fd44ce1e27fb8dbf5ec9b0f37dcece3d0cc/src/biosequence/biosequence.jl#L76 and return false

Based on the above, it seems that there are 2 options for how to resolve this:

  1. implement missing functionality in Kmers.jl to allow building from a generator
  2. modify the has_interface line to use pre-collected (syms) rather than putting them back into a generator/iterator ((i for i in syms))

I guess the key question is whether being able to construct a BioSequence from an iterator/generator is part of the interface specification and this is the intended result of the test or whether the test condition could/should be loosened

Thoughts?

jakobnissen commented 1 year ago

The idea behind using a generator in has_interface was to explicitly test that it was constructable from an arbitrary iterable of the right biosymbol, in particular this line from the extended docstring of BioSequence:

  • T must be able to be constructed from any iterable with length defined and with a known, compatible element type.

I think this should just be added to Kmers.jl.

jakobnissen commented 1 year ago

I've tranferred this issue to Kmers.jl, since it's a bug in Kmers, not BioSequences