Solve low-data problems in RNA design through development of diverse sequences, design principles, and automated tools.
This is currently recorded authoritatively in our one-page design document.
To break down the parts of this:
As might have been expected, we will continue to focus specifically on RNA
We choose to focus specifically on low-data problems - that is, areas with little experimental data. AI and ML tools can perform very well once given sufficient data about what performs well vs not (as we have already with recent secondary structure design methods, many of which are based on Eterna's past research). However, it is still useful to look to human intuition and creativity when there is not sufficient data available to train such algorithms (or in situations where we can look to players for things like novel approaches to featurization).
We conceive of the desired outputs of our system as something of a pipeline. We've found that while purely aiming to generate useful sequences can be beneficial, our ultimate goal is to come up with scalable solutions to design problems. To do this, we must first have a diverse set of sequences to validate (so that we can explore the possibility space as broadly as possible and do rigorous analysis), develop principles which can be more broadly applied to other situations/targets so that we have successfully learned from our research, and then ultimately lead to the development of automated tools so that novel problems can be solved quickly, effectively, and efficiently.
The goal we came up with:
This is currently recorded authoritatively in our one-page design document.
To break down the parts of this: