Extracting EVO representations rather than logits

amoskalev commented 4 months ago

Hi, thanks for your amazing work!

How can I extract representations rather than logits from the model?

I am using the huggingface version, and I see the model returns logits' and 'past_key_values. Could you please explain what's in past_key_values and if anything of those can be used as a sequence representation? Or maybe you can suggest other ways to access representations of a model?

davidkell commented 3 months ago

Here's how I'm currently solving this (adapted from usage in README) :

from evo import Evo
import torch

device = 'cuda:0'

evo_model = Evo('evo-1-131k-base')
model, tokenizer = evo_model.model, evo_model.tokenizer
model.to(device)
model.eval()

# monkey patch the unembed function with identity
# this removes the final projection back from the embedding space into tokens
# so the "logits" of the model is now the final layer embedding
# see source for unembed - https://huggingface.co/togethercomputer/evo-1-131k-base/blob/main/model.py#L339

from torch import nn

class CustomEmbedding(nn.Module):
  def unembed(self, u):
    return u

model.unembed = CustomEmbedding()

# end custom code

sequence = 'ACGT'
input_ids = torch.tensor(
    tokenizer.tokenize(sequence),
    dtype=torch.int,
).to(device).unsqueeze(0)

embed, _ = model(input_ids) # (batch, length, embed dim)

print('Embed: ', embed)
print('Shape (batch, length, embed dim): ', embed.shape)

# you can now use embedding for downstream classification tasks
# you probably want to aggregate over position dimension
# e.g. mean value = embed.mean(dim=1) or final token embedding = embed[:, -1, :]

Note that this is for the model object returned by evo-model, which is an instance of StripedHyena. If you are using Huggingface directly, this is wrapped with StripedHyenaModelForCausalLM, so you need to do model.backbone.unembed = CustomEmbedding()

seyonechithrananda commented 3 months ago

Thanks @davidkell !

zhongwang commented 2 months ago

@davidkell I tried your code on a A100 40GB using the evo-8k model, embedding the 4-letter sequence in the example costs over 400MB GPU RAM, the model itself needs 13GB. The embedding dimension is 4096. I don't understand why it cost so much memory. 4x4096 BF16 should only take 32KB, right? I tried to embed a 2kb sequence but always ran out of cuda memory. Anyone has a similar problem?

davidkell commented 2 months ago

I had a similar experience. I was able to get inference working for 2k sequences on A100 80GB (e.g. available on Paperspace), although around 2.5-3k I would get OOM. I haven't looked in depth on what is driving the memory requirement

davidkell commented 2 months ago

Quoting from this issue https://github.com/evo-design/evo/issues/24:

Prompting with longer sequences requires sharding for the model, which is currently not supported

So I think if you want to generate embeddings for longer sequences, you will need to manually shard on GPUs or setup CPU offloading or something like that

evo-design / evo

Extracting EVO representations rather than logits #32