bebop / poly

A Go package for engineering organisms.
https://pkg.go.dev/github.com/bebop/poly
MIT License
659 stars 68 forks source link

Genbank parser needs heavy refactor or a rewrite #434

Open carreter opened 6 months ago

carreter commented 6 months ago

Describe the desired feature/enhancement

The io/genbank package needs a significant refactor or to be re-written entirely.

Is your feature request related to a problem?

There are several outstanding issues with the parser that will be difficult to address in its current state: #383 #352 #351 #342

Describe alternatives you've considered (optional)

Main alternative is to hack over the code as-is to fix bugs, but I think this will take more time than a rewrite.

carreter commented 6 months ago

Dropping a link to the GenBank spec here: https://www.ncbi.nlm.nih.gov/genbank/release/current/

TwFlem commented 6 months ago

That's a neat looking format :thinking:

Happy to take a whack at a rewrite.

A few questions:

  1. Do we want to incrementally rewrite this? For instance, starting with just parsing a genbank file? Would a good place to start be with just the info in the header? This way, we can discuss what error handling looks like, how we want things organized, setting the tone for how we want to parse the rest of the info, etc.
  2. Do we want to start with the latest version of the spec? Or do we want to build out the backwards compatibility for spec breaking changes along the way?
carreter commented 6 months ago
  1. I actually just kinda went at it, but would be happy to take a step back and replan - I've definitely noticed some weaknesses in my approach. I'll clean up the code and get it in committable state for just the header!
  2. I was targeting the latest version of the spec in my approach, but unsure what the best approach is of the two options you present.
TwFlem commented 6 months ago

re 1.

Oh nice! I don't want to step on your toes- so no pressure. Maybe there's room to divy up different parts of the spec after establishing how the header will be parsed? If so, I'd be happy to help with that!

Also, if it is inappropriate to divy up the work, I'd be happy to help build out 400 instead. That or any other issue that would help out Poly that is higher priority and or high difficulty.

re 2.

I think targeting the latest version of the spec is definitely the better option. Adding backwards compat after the fact seems straight forward in this case. It doesn't look like there are too many breaking changes and they seem pretty separate from everything else.

carreter commented 6 months ago

Not stepping on any toes :) I'll try and get that code up today, I'm away from the PC I had the code on.

Feel free to take a stab at any issues you would like!!!

carreter commented 6 months ago

Ok, draft PR is up that has the types + most of the layout of the parsers logic. @TwFlem let me know what you think!

TwFlem commented 6 months ago

@carreter That looks super good! That pattern will go along with way the genbank parser along with the other parsers as well. The error type looks super good as well. It looks flexible enough so we can get nice and specific about what it wrong. The way things are being done just looks good in general.

The only real value added suggestions are about testing.

The others are more speculation.

github-actions[bot] commented 4 months ago

This issue has had no activity in the past 2 months. Marking as stale.