BioJulia / Indexes.jl

MIT License
3 stars 4 forks source link

Handling of chromosomes not in index #9

Open rulixxx opened 2 years ago

rulixxx commented 2 years ago

Currently looking up a chromosome not found in the index of a file croaks with an error in Indexes.overlapchunks because seqid has a value of Nothing. Error is not captured with if condition:

function overlapchunks(tabix::Tabix, interval::Interval)
     seqid = findfirst(isequal(BioGenerics.seqname(interval)), tabix.names)
      if seqid == 0
          throw(ArgumentError("failed to find sequence name '$(BioGenerics.seqname(interval))'"))
      end
      return overlapchunks(tabix.index, seqid, BioGenerics.leftposition(interval):BioGenerics.rightposition(interval))
  end

Suggest to change handling of this situation altogether. In a situation where the chromosome is not found in the tabix.name function should return an empty list of chunks. This would be useful when handling sex chromosomes in a more natural way (just like tabix utility does).

CiaranOMara commented 2 years ago

Are you able to open a PR with seqid === nothing and contribute some unit tests? If there were a hotfix for this issue, it would get merged into the v0.1 branch.

Regarding the return type, I suspect the motivation for the current behaviour was to adopt Base's iterator interface and other finding patterns. This return behaviour could be revisited for v0.3 after we get CodecBGZF up and running.