evolutionaryscale / esm

Other
1.22k stars 137 forks source link

How to get embeddings from multiple sequences in a single step #120

Open francescopatane96 opened 1 week ago

francescopatane96 commented 1 week ago

Hi there! Congrats for the great work done

I'm trying to get embeddings from multiple sequences using batches but i get this error:

`AttributeError Traceback (most recent call last) in <cell line: 19>() 17 ] 18 ---> 19 tensor = model.batch_generate(proteins, configs)

3 frames /usr/local/lib/python3.10/dist-packages/esm/models/esm3.py in batch_generate(self, inputs, configs) 419 420 if isinstance(inputs[0], ESMProtein): --> 421 return iterative_sampling_raw(self, inputs, configs) # type: ignore 422 elif isinstance(inputs[0], ESMProteinTensor): 423 return iterative_sampling_tokens(

/usr/local/lib/python3.10/dist-packages/esm/utils/generation.py in iterative_sampling_raw(client, proteins, configs) 105 input_tokens = [client.encode(protein) for protein in proteins] 106 --> 107 output_tokens_list = client.batch_generate(input_tokens, configs) 108 109 raw_proteins: list[ESMProtein | ESMProteinError] = []

/usr/local/lib/python3.10/dist-packages/esm/models/esm3.py in batch_generate(self, inputs, configs) 421 return iterative_sampling_raw(self, inputs, configs) # type: ignore 422 elif isinstance(inputs[0], ESMProteinTensor): --> 423 return iterative_sampling_tokens( 424 self, 425 inputs, # type: ignore

/usr/local/lib/python3.10/dist-packages/esm/utils/generation.py in iterative_sampling_tokens(client, input_tokens, configs, tokenizers) 331 # Clear structure tokens if user would like to condition only on coordinates. 332 for tokens, config in zip(sampled_tokens, configs): --> 333 if config.condition_on_coordinates_only and tokens.coordinates is not None: 334 tokens.structure = None 335

AttributeError: 'SamplingConfig' object has no attribute 'condition_on_coordinates_only'`

Here the code i have wrote:

`protein1 = ESMProtein( sequence=( "FIFLALLGAAVAFPVDDDDKIVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEG" ) ) protein2 = ESMProtein( sequence=( "FIFLALLGAAVAFPVDDDDKIVGGYTCGANTVPYQVSLNSGYHFCGGSLINSQWVVSAAHCYKSGIQVRLGEDNINVVEG" ) )

proteins = [protein1, protein2]

configs = [ SamplingConfig(return_per_residue_embeddings=True), SamplingConfig(return_per_residue_embeddings=True) ]

tensor = model.batch_generate(proteins, configs)`

Thank you for the help,

Francesco

santiag0m commented 1 week ago

Hi @francescopatane96 , which version of the esm package are you using?

Can you try installing the latest version and running the prompt?