Closed pzhang84 closed 2 years ago
The initial cell and hidden state are randomly initialized, so the first few embeddings are somewhat random until it starts to converge. You can avoid this by running some warmup sequences (see e.g. https://github.com/sacdallago/bio_embeddings/blob/a9cb5eb90dd13814fe59ef9aeef797be0b99b4e6/bio_embeddings/embed/seqvec_embedder.py#L72-L76)
Awesome pre-trained models! I am using the provided pre-trained model to do the protein sequence embedding, and it seems that the model produces different embeddings each time for the same protein sequence (the content of embeddings are similar though). Does that mean the model is still training even in the embedding process? I wonder if you could share any insight into that? Thanks!