bebop / poly

A Go package for engineering organisms.
https://pkg.go.dev/github.com/bebop/poly
MIT License
674 stars 73 forks source link

The Poly RBS calculator #145

Closed Koeng101 closed 3 years ago

Koeng101 commented 3 years ago

Salis lab has previously made a ribosomal binding site calculator, which can predict translation initiation rates from proteins.

However, it is slow (requiring a queue on a website) and closed source. In order to incorporate RBS calculation data in more complex applications, we need better performance and velocity of development. The best advancements in technology should be incorporated in an open-source manner.

Basic idea of RBS calculator

The basic idea behind the RBS calculator (this is a simplification) is that you take the binding energy of the the ribosomal 16S RNA to the mRNA's RBS site and subtract that from the binding energy of the mRNA to itself. There are a few other variables, but these are the basic ones (please check https://pubs.acs.org/doi/suppl/10.1021/acssynbio.0c00394/suppl_file/sb0c00394_si_001.pdf table 2 for equations)

mRNA is a large variable, so it must be calculated each time the simulation is run. The 16S RNA, on the other hand, is not very variable. There is approximately a power-law distribution of what organisms people use, so we can cache most of the 16S-RNA to RBS (which I will now call 16S-RBS) data in a lookup table.

Software and numbers we need

It is important to keep in mind we want this software to be fast. In order to get performance, there are 2 primary optimizations: 1 - using a faster algorithm for calculating RNA secondary structure (we use LinearFold, which folds RNA in linear time) and 2 - using a lookup table for slow RNA-to-RNA binding calculations.

In order to calculate mRNA folding, @vivekr has ported LinearFold to Golang. This package needs to be incorporated into Poly before we build the RBS calculator.

In order to calculate the 16S-RBS lookup table, we will likely need to operate outside of Golang (probably in python). LinearFold does not support (at this time) multiple separate RNAs binding to each other, so we'll have to do this work in a different algorithm. It will be a challenge to relate the two numbers from different software packages. Since the 16S RNA binding sequence is only 9 base pairs long, we theoretically only have to calculate its binding efficiency to 262,144 other RNAs.

There are other parameters that assist in doing RBS calculations (such as ΔGstandb, from https://pubs.acs.org/doi/suppl/10.1021/acssynbio.0c00394/suppl_file/sb0c00394_si_001.pdf). We'll likely need to build those into the calculator at some point, but perhaps not in version 1.

Testing

After we get a prototype-functioning RBS calculator, we can tune our model. One dataset from Salis Lab has 9862 sequences, and we can directly compare our calculator's outputs from the ones published by Salis Lab. We can also use empirical calculations from ~300,000 RBSs from Large-scale DNA-based phenotypic recording and deep learning enable highly accurate sequence-function mapping. Using a couple of these data sets, we should be able to massage our RBS calculator to get to "good enough"

While it probably won't be as absolutely efficient in organisms with large machine learning model datasets, we can present machine learning model datasets with our calculator's calculation as a parameter, and hopefully improve their abilities by giving them data.

The goal is to make something that is useful to scientists and engineers. Our calculator can still be mildly wrong, so long as it is fundamentally useful to practitioners.

In-vivo testing

After we build the Poly RBS calculator, Sporenet Labs (aka Keoni Gandall, aka me) plans to test its efficiency in a real laboratory environment. As the group who builds the thing, we'll all decide together what experiments we should run. Ideally, we'll be using Bxb1-GFP in E.coli with an oligo pool or a degenerate primer library + some Nanopore sequencing.

rkrishnasanka commented 3 years ago

I know someone from salis lab, maybe they're willing to share the source for their calculator. (A v0)

ayaanhossain commented 3 years ago

Happy to chime in RK. Although I am not a developer for RBS Calculator, the v1.0 code is already open source and can be found on Dr. @hsalis' GitHub here.

Koeng101 commented 3 years ago

Happy to chime in RK. Although I am not a developer for RBS Calculator, the v1.0 code is already open source and can be found on Dr. @hsalis' GitHub here.

The version 1 is, but the updated versions aren't. It's not really developed with Open Source in mind, sadly. I have used that source code to understand how it works - very useful!

rkrishnasanka commented 3 years ago

@ayaanhossain you think it might be possible to connect/work with @Koeng101 to get the latest version?

Edit - apologies @hsalis if this came across the wrong way. My intention was to facilitate an intellectual conversation with the experts (i.e. your research group). I was under the impression that the latest version of the calculator was set to be open-sourced. There was in no way any intention to coerce anyone to share unpublished code.

hsalis commented 3 years ago

You're welcome to re-implement the RBS Calculator algorithm using our v1.0 source code for inspiration [so long as you follow the terms of its open source license]. And I'd be happy to answer questions about how it works. But please do not ask members of my lab for unpublished source code. That's called theft.

TimothyStiles commented 3 years ago

You're welcome to re-implement the RBS Calculator algorithm using our v1.0 source code for inspiration [so long as you follow the terms of its open source license]. And I'd be happy to answer questions about how it works. But please do not ask members of my lab for unpublished source code. That's called theft.

@hsalis I don't think theft was @rkrishnasanka's intent but I agree that he should've asked you and I before suggesting this.

@Koeng101 may be able to talk about this more but I believe our implementation is going to end up being different in several ways and of course we'll cite your work both in publications and the source code itself.

Koeng101 commented 3 years ago

Closing. Work will take place at https://github.com/allyourbasepair/rbscalculator until stability.

Api released here: http://api.rbscalculator.com/docs