evolutionaryscale / esm

Other
1.27k stars 146 forks source link

error getting sequence embeddings. #138

Open santule opened 2 days ago

santule commented 2 days ago

Hi, I am processing lots of sequences in a fasta file but esm3-small model failed on the sequence "EHVAATHKTGLDALAELTGAALNSVEKLSELQFQTVRASLEDSTEQGKRVFDARSLHELTALQSEVSQPTEKLVAYGRHLYQIAAGTHAEWRKVAQTRA". I tried reducing the sequence to see where exactly it failed and I have written the amino acid till which it works and then fails.

Working

model = esm.sdk.client("esm3-small",token= my_token)
protein = ESMProtein(
    sequence=(
        "EHVAATHKTGLDALAEL"
    )
)
protein_tensor = model.encode(protein)

output2 = np.array(model.forward_and_sample(
        protein_tensor, SamplingConfig(return_mean_embedding=True)).mean_embedding)
print(output2.shape)
(1536,)

Not Working

model = esm.sdk.client("esm3-small",token= my_token)
protein = ESMProtein(
    sequence=(
        "EHVAATHKTGLDALAELT"
    )
)
protein_tensor = model.encode(protein)

output2 = np.array(model.forward_and_sample(
        protein_tensor, SamplingConfig(return_mean_embedding=True)).mean_embedding)
print(output2.shape)

[/usr/local/lib/python3.10/dist-packages/esm/sdk/forge.py](https://localhost:8080/#) in forward_and_sample(self, input, sampling_configuration)
    326         }
    327 
--> 328         req["sequence"] = maybe_list(input.sequence)
    329         req["structure"] = maybe_list(input.structure)
    330         req["secondary_structure"] = maybe_list(input.secondary_structure)

AttributeError: 'ESMProteinError' object has no attribute 'sequence'

Am I using the correct model ? I did not get this error on the open model in Hugging Face.

Thanks for your help Regards, Sanjana

ebetica commented 1 day ago

What is the message of the error? I can think of a network blip, or maybe the sequence hits our safety filter.

santule commented 1 day ago

I also thought initially it might be network glitch but then tried a multiple times and it doesnt work. As soon as you substitute the last "T" with some other amino acid, it works. The error is - 'ESMProteinError' object has no attribute 'sequence'. The details on the error is pasted in the issue too. Can you check if it is the safety filter issue?

Thanks,Sanjana