This PR implements several major refactors and improvements to gRNAde's modelling, data preparation and featurization, as well as user interface.
Here is a summary:
Data processing is now significantly more involved, careful, and better. We save lots of relevant metadata along with the sequence and structures for an RNA. We also perform clustering for both sequence identity as well as structural similarity.
In fact, we had to modularise the data related classes and methods into a few different files to stay organised.
We have a better way of handling and setting constants throughout the codebase, using environment variables.
We have improved the training and evaluation-related sections of the codebase, incorporating better metrics and improving logging to wandb.
We have added a command line tool for using gRNAde as well as checkpoints so that users can design their own RNAs.
Towards that end, we have also created a tutorial introducing how to use gRNAde as an interactive jupyter notebook.
Overall, the codebase is much improved in my opinion.
This PR implements several major refactors and improvements to gRNAde's modelling, data preparation and featurization, as well as user interface.
Here is a summary: