OpenProteinAI / PoET

Inference code for PoET: A generative model of protein families as sequences-of-sequences
MIT License
47 stars 2 forks source link

PoET for peptide binding affinity prediction #2

Open vladimirkovacevic opened 4 months ago

vladimirkovacevic commented 4 months ago

Would it make sense to utilise PoET for binding affinity prediction (ic50 score) of short peptide sequences (8-16 amino acids)? If yes, do you have any suggestions/comments on how to do it? Is it possible to obtain embedding (representation) for each protein sequence and pass it to some custom classification head that will be trained?

Thanks!

timt51 commented 4 months ago

We haven't really tested with short peptides in particular, but if you have at least a couple of relevant homologs to condition on, PoET should be helpful. Fine-tuning PoET or training a model on top of PoET embeddings as you suggest are definitely reasonable ways of utilizing PoET for this!

The embed method (example usage in scoring script) can be used to get embeddings. If you want the embeddings for a lot of sequences conditioned on the same prompt, you can do so more efficiently using the logits method with return_embeddings=True (example usage in scoring script).

If you're interested, we also offer REST and Python APIs to get embeddings from PoET, which you can request access to through this form and is free for academic use.

vladimirkovacevic commented 4 months ago

Thank you @timt51 for very detailed explanation. I'll try to test what you've suggested. Also, I've filled the form you linked. I'll let you know how it went.