THUDM / ProteinLM

Protein Language Model
Apache License 2.0
112 stars 20 forks source link

Extracting embeddings? #5

Closed ddofer closed 3 years ago

ddofer commented 3 years ago

Could you please provide an example for extracting embeddings (per position and per sequence/batch) from the models?

Yijia-Xiao commented 3 years ago

Hi, Ddofer! Thank you for your interest in our work. I think transformer_output is what you are looking for (protein embeddings). You can find transformer_output here. If you want to use ProteinLM to encode protein sequences, the easiest way is to directly dump the output of the transformer model, then you can load the embedding for downstream tasks. If you want to perform end-to-end training, you may need to add some finetune code based on ProteinLM. Hope my answer solves your problem :)