evo-design / evo

Biological foundation modeling from molecular to genome scale
Apache License 2.0
933 stars 113 forks source link

ModuleNotFoundError: No module named 'transformers_modules.togethercomputer.evo-1-131k-base.9562f3fdc38f09b92594864c5e98264f1bfbca33.tokenizer' #53

Open adrienchaton opened 5 months ago

adrienchaton commented 5 months ago

Hi all and thanks for open sourcing this interesting model!

I managed to install flash-attention and all other packages so I am able to import Evo package. But I am stuck with the following error ModuleNotFoundError: No module named 'transformers_modules.togethercomputer.evo-1-131k-base.9562f3fdc38f09b92594864c5e98264f1bfbca33.tokenizer'

This happens regardless of using the source code

from evo import Evo
import torch
device = 'cuda:0'
evo_model = Evo('evo-1-131k-base') # here it crashes

or trying to load directly from HF

from transformers import AutoConfig, AutoModelForCausalLM
model_name = 'togethercomputer/evo-1-131k-base'
model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
model_config.use_cache = True
model = AutoModelForCausalLM.from_pretrained(model_name, config=model_config, trust_remote_code=True) # here it crashes

The error points to transformers_modules.togethercomputer.evo-1-131k-base regardless of which EVO checkpoint I select and I tried to update transformers both to latest or to "4.36.2" as show in https://huggingface.co/togethercomputer/evo-1-131k-base/blob/main/generation_config.json

Any clue on how to solve this error please? Thanks!

Zymrael commented 5 months ago

What is the stack trace for the error you see when it crashes

adrienchaton commented 5 months ago

Thanks for your reply, I am still stuck with this error and could not use Evo model yet.

Here is the full trace, the same happens if I try to load the phase 2 checkpoint or if I try to load from the Evo package instead of the auto class from HuggingFace

>>> model_config = AutoConfig.from_pretrained('togethercomputer/evo-1-8k-base', trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained('togethercomputer/evo-1-8k-base', config=model_config, trust_remote_code=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxx/mambaforge/envs/bm-llms-minimal/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 550, in from_pretrained
    model_class = get_class_from_dynamic_module(
  File "xxx/mambaforge/envs/bm-llms-minimal/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 501, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module)
  File "xxx/mambaforge/envs/bm-llms-minimal/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 201, in get_class_in_module
    module = importlib.machinery.SourceFileLoader(name, module_path).load_module()
  File "<frozen importlib._bootstrap_external>", line 529, in _check_name_wrapper
  File "<frozen importlib._bootstrap_external>", line 1029, in load_module
  File "<frozen importlib._bootstrap_external>", line 854, in load_module
  File "<frozen importlib._bootstrap>", line 274, in _load_module_shim
  File "<frozen importlib._bootstrap>", line 711, in _load
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "xxx/.cache/huggingface/modules/transformers_modules/togethercomputer/evo-1-131k-base/9562f3fdc38f09b92594864c5e98264f1bfbca33/modeling_hyena.py", line 11, in <module>
    from .model import StripedHyena
  File "xxx/.cache/huggingface/modules/transformers_modules/togethercomputer/evo-1-131k-base/9562f3fdc38f09b92594864c5e98264f1bfbca33/model.py", line 26, in <module>
    from .tokenizer import ByteTokenizer
ModuleNotFoundError: No module named 'transformers_modules.togethercomputer.evo-1-131k-base.9562f3fdc38f09b92594864c5e98264f1bfbca33.tokenizer'

To answer your question on the huggingface space, I tried both transformers==4.36.2 as shown in the config file or currently I tried with

# Name                    Version                   Build  Channel
transformers              4.39.3             pyhd8ed1ab_0    conda-forge

It is the first HuggingFace model imported from external classes that I try to run so I never came across such error ... Thanks for any hints!

adrienchaton commented 5 months ago

@Zymrael in case that is relevant, I also tried to manually download the checkpoints and load from the local copies but it didn't help (same error). And I also tried to update transformers to the latest from the github source which also produces the same error.

# Name                    Version                   Build  Channel
transformers              4.41.0.dev0              pypi_0    pypi
oliverfleetwood commented 5 months ago

I have the same issue

adrienchaton commented 5 months ago

@oliverfleetwood did you make any progress? @Zymrael I am not sure what is this issue but my guess could be that some version mismatch may currently lead to it ... would it make sense to try to build an env with everything pinned to the package versions you are using when running Evo? thanks in advance for any help, that's a pitty not to be able to test it ...

juliocesar-io commented 4 months ago

Hello all, I had the the same issue and I found a workaround, apparently the revision=1.1_fix is not able to download the model from HF...maybe cache issues?

The error I got:

ModuleNotFoundError: No module named 'transformers_modules.togethercomputer.evo-1-131k-base.c206aab77ae5967a069c4200ecb1858588528c9d.tokenizer'

How to fix it

Change to revision=main on the Evo class function load_checkpoint. Located in evo/models.py, like this:


    model_config = AutoConfig.from_pretrained(
        hf_model_name,
        trust_remote_code=True,
        revision='main', # change here
    )
    model_config.use_cache = True

    # Load model.
    model = AutoModelForCausalLM.from_pretrained(
        hf_model_name,
        config=model_config,
        trust_remote_code=True,
        revision='main', # change here
    )

Try to load the model again, eg.

from evo import Evo
import torch

device = 'cuda:0'

evo_model = Evo('evo-1-131k-base')
model, tokenizer = evo_model.model, evo_model.tokenizer
model.to(device)
model.eval()

sequence = 'ACGT'
input_ids = torch.tensor(
    tokenizer.tokenize(sequence),
    dtype=torch.int,
).to(device).unsqueeze(0)
logits, _ = model(input_ids) # (batch, length, vocab)

print('Logits: ', logits)
print('Shape (batch, length, vocab): ', logits.shape)

it should load the checkpoints, make sure the model downloaded correctly from HF https://huggingface.co/togethercomputer/evo-1-131k-base/tree/main and check that your cache folder has all those files in .cache/huggingface/modules/transformers_modules/togethercomputer/evo-1-131k-base/<commit_hash>, you can also manually download the files using git clone https://huggingface.co/togethercomputer/evo-1-131k-base and put the them in the corresping commit_hash folder.

and then... it works. :)

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.82it/s]
---------5---------
Logits:  tensor([[[-13.8125, -23.2500, -23.2500,  ..., -23.2500, -23.2500, -23.2500],
         [ -6.6250, -21.1250, -21.1250,  ..., -21.1250, -21.1250, -21.1250],
         [ -7.2500, -20.8750, -20.8750,  ..., -20.8750, -20.8750, -20.8750],
         [ -8.1875, -20.5000, -20.5000,  ..., -20.5000, -20.5000, -20.5000]]],
       device='cuda:0', dtype=torch.bfloat16, grad_fn=<UnsafeViewBackward0>)
Shape (batch, length, vocab):  torch.Size([1, 4, 512])

If the above doesn't work try the other model evo_model = Evo('evo-1-8k-base')

Environment

Hope that helps!

adrienchaton commented 4 months ago

@juliocesar-io thanks a lot, this fixed my issue!!! FYI, additionally I needed to manually copy the evo/configs/*.yml into my python package for evo

Following on this, I have a couple questions please (maybe @Zymrael knows too?), regarding the code snippet below (helper function for myself)

@torch.inference_mode()
def compute_logits(evo_model, evo_tokenizer, sequences=["ATCG", "AATTCCGG"], cuda_device=0):
    assert type(sequences) is list, "fn. intended for batched processing with list of input sequences"
    assert cuda_device >= 0, f"device {cuda_device} must be int>=0"
    input_ids, seq_lengths = prepare_batch(sequences, evo_tokenizer, prepend_bos=False, device=f'cuda:{cuda_device}')
    # --> input_ids are padded with ones
    # TODO: check against the default prepend_bos=True
    # TODO: what about the attention mask? i.e. lower triangular for masking future steps and always zero on PAD tokens
    logits, inference_params_dict_out = evo_model(input_ids, inference_params_dict=None, padding_mask=None)
    # logits with shape [batch=len(sequences), length=max(seq_lengths), vocab=512]
    # inference_params_dict_out = None
    return logits.cpu().float(), seq_lengths, inference_params_dict_out

Thanks again for your assistance!

kjm981995 commented 2 months ago

From https://huggingface.co/togethercomputer/evo-1-131k-base/discussions/1#66687c801c2c85a4f938e825, It can be fixed by just adding following code before loading model such as:

model_name = 'togethercomputer/evo-1-8k-base'

model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, revision = '1.1_fix')
model_config.use_cache = True

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

evo_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    config=model_config,
    trust_remote_code=True,
    revision = '1.1_fix'
)