RoniGurvich / Peptriever

Bi-Encoder approach for large-scale protein-peptide binding search
MIT License
12 stars 0 forks source link

Questions regarding inference script #18

Closed luukhd2 closed 4 months ago

luukhd2 commented 6 months ago

Inference script

I found this script on huggingface, https://huggingface.co/ronig/protein_biencoder Can you confirm this script is still accurate? If so, again to confirm, a lower distance equates to a higher binding score correct?

Thanks in advance for the help!

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("ronig/protein_biencoder")
model = AutoModel.from_pretrained("ronig/protein_biencoder", trust_remote_code=True)
model.eval()

peptide_sequence = "AAA"
protein_sequence = "MMM"
encoded_peptide = tokenizer.encode_plus(peptide_sequence, return_tensors='pt')
encoded_protein = tokenizer.encode_plus(protein_sequence, return_tensors='pt')

with torch.no_grad():
    peptide_output = model.forward1(encoded_peptide)
    protein_output = model.forward2(encoded_protein)

print("distance: ", torch.norm(peptide_output - protein_output, p=2))
RoniGurvich commented 6 months ago

Hi @luukhd2 The script is accurate and you are correct , lower distances means higher binding scores.