churchlab / UniRep

UniRep model, usage, and examples.
338 stars 96 forks source link

If possible could the code for evotuning and the weights for eUniRep tuned to TEM-1 beta lactamase be shared? #18

Closed ivanjayapurna closed 3 years ago

jmahenriques commented 4 years ago

Some code and/or a concise notebook on how to perform the evotuning would be greatly appreciated. We're interested in retraining UniRep with a particular set of sequences and we already have a pipeline to perform the embeddings and test them on different top models and ML tasks. We are really curious to see whether the evotuned UniRep embeddings can improve our predictions and how it compares to other sequence descriptors such as 1-hot, z-scales, fingerprints, etc.

I suspect evotuning is of higher interest to academia & industry than the current example on how to train a top model using the original UniRep.

jmahenriques commented 4 years ago

Since there seems to be some level of inactivity by the devs, I will suggest a very simple & fast re-implementation of UniRep in JAX by Novartis scientists that seems to be working for me/us at the moment: JAX-UniRep

It can embed multiple sequences at once and performs much faster (at least one order of magnitude faster). They also provide an interface for performing the evotuning which is exactly what I and many other non-ML specialists needed.

Hope it is useful.

ivanjayapurna commented 4 years ago

Hey @jmahenriques I'm currently working with the creators of JAX-UniRep to fix a bug with the evotuning function: https://github.com/ElArkk/jax-unirep/issues/37

I was wondering if in your use of the library you encountered these issues?

jmahenriques commented 4 years ago

Hi @ivanjayapurna! Nice, I really like their implementation and enjoyed reading the technical paper. I am definitely planning to retrain the model using JAX-UniRep, but I can't allocate the needed time and resources until it is clear to us whether we can make use of the pre-trained weights for both embedding and evotuning or not. They seem to be protected against commercial use and I work at a pharmaceutical company. Worst case scenario we might need to train the model from scratch, reproducing the original UniRep paper in order to go around the license. I will take a look at the issue you mention and if I encounter such issues, I'll make sure to communicate it. Thanks!