bebop / poly

A Go package for engineering organisms.
https://pkg.go.dev/github.com/bebop/poly
MIT License
673 stars 73 forks source link

Genbank error checking #244

Closed Koeng101 closed 2 years ago

Koeng101 commented 2 years ago

This is a genbank error checking file. As with the other branches, multi genbank parsing was removed. Parse, write, and read now defaults to outputting a list, though we don't have multi-genbank working quite yet.

I deleted TestLocusParseRegression because it doesn't work. These two lines are different, which the old parser did not pick up, but the newer parser picks up. https://github.com/TimothyStiles/poly/blob/359da2c15a5b20c12b68d5eb25f8254dd67ee62a/data/puc19static.gbk#L43

https://github.com/TimothyStiles/poly/blob/359da2c15a5b20c12b68d5eb25f8254dd67ee62a/data/puc19.gbk#L43

TimothyStiles commented 2 years ago

Biggest concern is that the original Genbank/MultiGenbank paradigm has been overwritten where now MultiGenbank is now just being called Genbank. Can we have both?

Koeng101 commented 2 years ago

Biggest concern is that the original Genbank/MultiGenbank paradigm has been overwritten where now MultiGenbank is now just being called Genbank. Can we have both?

Yes - Genbank is just MultiGenbank of list size 1 (in fact, the parser only does single Genbank right now, gotta fix it up to do multi. Lemme try that out). Much easier than having separate functions I think.

Removed redundancy in latest push and deleted puc19static - which was broken in multiple ways.

TimothyStiles commented 2 years ago

@Koeng101 don't forget to merge upstream. I'm pretty sure I deleted that static puc19 file in my last merged PR.

TimothyStiles commented 2 years ago

Biggest concern is that the original Genbank/MultiGenbank paradigm has been overwritten where now MultiGenbank is now just being called Genbank. Can we have both?

Yes - Genbank is just MultiGenbank of list size 1 (in fact, the parser only does single Genbank right now, gotta fix it up to do multi. Lemme try that out). Much easier than having separate functions I think.

@Koeng101 I think separate functions would be better for clarity but we can debate later. It'd be super easy to add a variadic argument to a multigenbank parser to limit number of genbanks parsed and just call that from our "single" genbank parser.

Koeng101 commented 2 years ago

@Koeng101 don't forget to merge upstream. I'm pretty sure I deleted that static puc19 file in my last merged PR.

Yes, this branch is merged upstream, and no, puc19static wasn't deleted because it was used for testing (I deleted that test)

I think separate functions would be better for clarity but we can debate later

Ye that is fair

Koeng101 commented 2 years ago

@TimothyStiles How is this looking? I recall I removed the multi-genbank functions, but is there anything else?

TimothyStiles commented 2 years ago

@Koeng101 found an issue with the genbank parser where it somehow strips out complements from the gbk string while parsing. Any idea on what's going on here? Screen Shot 2022-06-18 at 4 46 51 AM Screen Shot 2022-06-18 at 4 45 58 AM

TimothyStiles commented 2 years ago

Also need to fix qualifier line breaks but that should be easy enough Screen Shot 2022-06-18 at 4 51 23 AM h