evo-design / evo

Biological foundation modeling from molecular to genome scale
Apache License 2.0
874 stars 99 forks source link

Max Seq length for inference #24

Closed JunboShen closed 2 weeks ago

JunboShen commented 4 months ago

May I ask the proper range for input sequence length to do the inference using the evo-1-131k-base model? I tried to use a single A100 and got CUDA Out of Memory when inputting a single sequence longer than 1000. Thank you!

Zymrael commented 4 months ago

Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU.

If you'd like to test the model with longer prompt I recommend Together's API.

pan-genome commented 1 month ago

Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU.

If you'd like to test the model with longer prompt I recommend Together's API.

could you elaborate how to generate 500k on a single 80Gb GPU, I got OOM on A100 with 3kb sequence. Thank you

brianhie commented 1 month ago

@pan-genome we were able to just use the standard HuggingFace sampling API (e.g., loading with AutoModelForCausalLM.from_pretrained(), sampling with model.generate()) to generate 500k+ on an 80 Gb GPU.

pan-genome commented 2 weeks ago

@pan-genome we were able to just use the standard HuggingFace sampling API (e.g., loading with AutoModelForCausalLM.from_pretrained(), sampling with model.generate()) to generate 500k+ on an 80 Gb GPU.

could you provide a working code example? thank you

brianhie commented 2 weeks ago

Something like


model_config = AutoConfig.from_pretrained(
    'togethercomputer/evo-1-131k-base',
    trust_remote_code=True,
    revision="1.1_fix",
)
model_config.max_seqlen = 500_000

model = AutoModelForCausalLM.from_pretrained(
    'togethercomputer/evo-1-131k-base',
    config=model_config,
    trust_remote_code=True,
    revision="1.1_fix",
)

outputs = model.generate(
    input_ids,
    max_new_tokens=500_000,
    temperature=1.,
    top_k=4,
)