Koeng101 / dnadesign

A Go package for designing DNA.
Other
21 stars 0 forks source link

Restriction Enzymes from Rebase #76

Open rmcl opened 3 months ago

rmcl commented 3 months ago

Describe the desired feature/enhancement

I'd like the ability to use many more restriction sites much like biopython's Restriction package.

Is your feature request related to a problem?

I'm implementing a cloning design tool and have need for many more restriction enzymes. There isn't a problem with dnadesign, but it would be nice if it had more restriction sites built in.

Describe the solution you'd like

Well I'd be happy to contribute a script that downloads the rebase distribution much like Biopython (https://github.com/biopython/biopython/blob/master/Scripts/Restriction/rebase_update.py). It looks like the biopython solution has scripts that pull the rebase files off the FTP site and then generate a python file with a huge dictionary of restriction sites. (https://github.com/biopython/biopython/blob/master/Bio/Restriction/Restriction_Dictionary.py. ). This dictionary file is then committed to the repo as part of the biopython library.

I think that this is a nice solution since the end-user doesn't have to worry about downloading rebase themselves. I looked around for another go package that does this, but it doesn't seem to exist.

Describe alternatives you've considered (optional)

If it doesn't make sense to include in dnadesign, I could also just create a separate go package.

Additional context

Let me know if this is something that you'd like in dnadesign and I'll send a MR. Otherwise, I'll just create a separate package, but it would be nice to use the DNADesign Enzyme/EnzymeManager because it seems to have some of the cut site search logic built in already.

Thanks!

Koeng101 commented 3 months ago

Hey there! Thanks for writing an issue to the project. I appreciate.

Rebase should be integrated here - https://github.com/Koeng101/dnadesign/commit/27b41fb4fdb849d569278c965849e6f28fb2a7f6

I don't really do frequent updates though. Is that something you need?

rmcl commented 3 months ago

Hey! I implemented the rebase script and define a bunch of restriction enzymes in this package: https://pkg.go.dev/github.com/rmcl/restriction-enzymes & https://github.com/rmcl/restriction-enzymes.

Frequent updates isn't too important to me, but I thought I'd include the download script for completeness.

This separate library probably works for my purposes, but if you'd like to include this in dnadesign or upstream Poly thats cool too. Happy to send a MR if its helpful.

Thanks for the great library.

Cheers, Russell

Koeng101 commented 2 months ago

Fantastic! Looking through, I think the fact that you can directly translate from any restriction enzyme to fragments is simply better than our implementation. On the other side, there are some stylistic changes that would be necessary for an integration (which you may or may not want to do!). The TL;DR is that I think documentation could be improved, especially around context of the use around each package, testing coverage, and integration. For example, I think REBASE should probably not be generated as a Golang dictionary, but embedded as a string that is parsed upon import of a rebase package. Has the advantages of updates with only curl'ing the newest version of the database, and not needing manual running of scripts. I think there are some opportunities to simply code for particular use cases as well (I definitely want to simplify some of our code around this - purge enzymemanager and such, because I've found it awkward to use).

I'll probably take a swing at this at some point in the near future if you don't want to, but I'm also happy to work collaboratively. Also would love to know what you're actually using this for!

Is there a particular reason why dseq is implemented differently than our implementation? I'm sure there is a reason (ours doesn't cover all use cases), but I'd like to know precisely the decision behind that