Closed ptynecki closed 4 years ago
Hi Piotr, thanks for your interest and these great questions!
Yes, you can certainly extract per-residue embeddings on GPU. It's as easy as calling model.cuda()
before extracting the representations. Here's a short tutorial explaining this in more detail.
To answer your second question, you can get per-protein vectors by averaging the representations. It's a little more complicated than applying mean(dim=0)
because it's important to (a) drop the initial beginning of sentence token; and (b) remove all padding tokens. You can use the provided extract.py
script with --include mean
to do this automatically. Here's the relevant line of code that applies the mean pooling.
I'm closing out this issue, but feel free to reopen if you have any more questions.
Hey,
Thank you for doing the research which is needed in order to many biotech issues.
Is there any plan to add support for extracting per-residue embeddings on GPU (multi-GPU)?
I have another question: how can I apply ESM embedding to get per-protein vector? Is it enough if I will apply
mean(dim=0)
?Thanks, Piotr