evolutionaryscale / esm

Other
1.25k stars 139 forks source link

How to Predict Functional Annotation #31

Closed issah-samori closed 4 months ago

issah-samori commented 4 months ago

Hello,

From the example walkthroughs in this repo (generate.ipynb), it is clear how to generate sequence and structure form a prompt. Code from example below:

protein_prompt = ESMProtein(sequence=sequence_prompt, coordinates=structure_prompt)
sequence_generation_config = GenerationConfig(
    track="sequence",
    num_steps=sequence_prompt.count("_") // 2,
    temperature=0.5, 
)

sequence_generation = model.generate(protein_prompt, sequence_generation_config)
structure_prediction_config = GenerationConfig(
    track="structure",
    num_steps=len(sequence_generation) // 8,
    temperature=0.7,
)
structure_prediction_prompt = ESMProtein(sequence=sequence_generation.sequence)
structure_prediction = model.generate(structure_prediction_prompt, structure_prediction_config)

I want to predict functional annotation given a sequence, so I replace track in the GenerationConfig method with "function" (as shown below), but it does not work:

function_prediction_config = GenerationConfig(
    track="function",
    num_steps=len(structure_prediction) // 2,
    temperature=0.7,
)

function_prediction_prompt = ESMProtein(sequence=structure_prediction.sequence)
function_prediction = model.generate(function_prediction_prompt, function_prediction_config)

I get the following error: ValueError: Sampling only masked tokens is undefined for function tokens. I was wondering what I was doing wrong. Thank you so much!

santiag0m commented 4 months ago

Check https://github.com/evolutionaryscale/esm/issues/24#issuecomment-2218361262