BioJulia / BioSequences.jl

Biological sequences for the julia language
http://biojulia.dev/BioSequences.jl
MIT License
152 stars 48 forks source link

Document BioSequence interface(s) #328

Open kescobo opened 3 hours ago

kescobo commented 3 hours ago

Thought about this due to this comment from #322

This commit adds new methods for findnext and findprev for SeqOrView with known alphabets,...

It's only implemented for known alphabets because new alphabets may overload == in surprising ways, which makes the bitparallel ops invalid.

Possibly related to #296 and #140 ?

I know we've had this conversation before, but couldn't find it in issues - on the one hand, it would be cool if it were possible for people to come up with their own alphabets and be able to take advantage of all of these optimizations. On the other hand, I can't really think of what those alternative alphabets might be, and if someone is advanced enough to want them, maybe they can figure it out.

On the third hand :octopus:, it might make it more feasible for others to provide package maintenance if all of this stuff is better documented, even if no extensions are desired. I am mindful of your development time though @jakobnissen, and I know documentation is often a low priority (you do a great job prioritizing it for user-facing stuff).

jakobnissen commented 3 hours ago

One common type of alphabet is reduced amino acid alphabets. E.g. leucine and isoleucine (and other, similar pairs) can be collapsed to the same symbol, such that it fits in fewer bits to save memory. Yes, I agree this is a good idea! :+1: I think most of this package is written in a pretty generic manner, so it should be relatively easy.