embed_matrix:0.npy in evotuned dumped weights

Hi @mengqvist ,

Good catch. embed_matrix:0.npy contains the vectors for the initial 10-dimensional embedding of each amino acid (the embedded seq's then get passed to the mLSTM).

In the original UniRep implementation, this embedding matrix gets randomly initialised, and then learned together with the mLSTM weights during training.

So far, we have not implemented this embedding layer in jax-unirep, and so we always use the embedding matrix from the original publication, which was learned using the UniRef50 dataset. It has been in the back of my head for a while now to "complete" the re-implementation by also implementing the embedding layer, to be able to generate custom embeddings during evotuning (or complete re-training of the model).

For now, I think it's a good idea to dump the embedding matrix together with the rest, to make sure dumped weights can be used by both libraries. Let me know if you'd like to submit a small PR yourself to change this behaviour.

ElArkk / jax-unirep

embed_matrix:0.npy in evotuned dumped weights #83