Sheffield-iGEM / syn-zeug

A modern toolbox for synthetic biology
https://sheffield-igem.github.io/syn-zeug/
GNU Affero General Public License v3.0
6 stars 3 forks source link

Add Inner `SeqKinds` for Descriminating Between Canonical, N-Containing, and IUPAC Alphabets #21

Closed TheLostLambda closed 2 years ago

TheLostLambda commented 2 years ago

The existing SeqKind enum could have an inner field added – something like Dna(KindExt::IUPAC).

This would affect:

  1. The Display implementation -> DNA (IUPAC)
  2. The output of kind()
  3. Sequence validation within constructors
  4. Filtering of text in future sequence filtering operations like this tool
  5. The magic sequence constructor (for automatically determining sequence kind)

See the Rust-bio docs for more.

TheLostLambda commented 2 years ago

My original idea of having these extensions nested in the Kind type was a bit misguided. Kind is a necessarily user-exposed enum (the user needs to construct it to use methods like convert()), but it doesn't make sense for the user to ever change the KindExt. You can't just remove N's or other special placeholders (not without more logic and assumptions than I'm comfortable putting in a type conversion).

The Kind and KindExt will need to be siblings, not parent and child. This probably means making a new Kind struct, then nesting the SubKind and KindExt enums inside it. I suppose now the biggest problem is finding a nicer name for SubKind, since that will likely be the primary point of interaction for the user (when using convert()).

Honestly, naming here will be dreadful (mostly because type is a reserved keyword)... Kind, Alphabet, TypeInfo, etc... The alternative would be keeping kind as is and adding an alphabet field to Seq, where its enum Alphabet holds: Canonical, N, Iupac or something similar (with better names please...) Honestly, I'll need to try both the struct and adding of a new field to Seq to see which works better. I'm leaning towards a new field, but that's going to complicate error printing quite a bit...

It's worth mentioning this is a bit of a prerequisite to #7 , as you can't really translate something with Ns or in IUPAC...

TheLostLambda commented 2 years ago

What a mess...