Running the model in a firewalled environment

ernestmordret commented 4 months ago

Hi! Thank you very much for the model, the pre-print looks fantastic!

I'd like to use your Huggingface model on our A100 GPUs, but unfortunately we have to work in a firewalled environment and therefore it complicates everything. I'm allowed to access the internet through the "submit" machines, that do not have GPUs, and then I have to switch to offline GPU machines to run the model.

For now, I have attempted to split the process in 2: First, use the submit machine to download the model locally

git clone git@hf.co:togethercomputer/evo-1-131k-base
Cloning into 'evo-1-131k-base'...
remote: Enumerating objects: 134, done.
remote: Counting objects: 100% (131/131), done.
remote: Compressing objects: 100% (130/130), done.
remote: Total 134 (delta 54), reused 0 (delta 0), pack-reused 3
Receiving objects: 100% (134/134), 58.39 KiB | 4.49 MiB/s, done.
Resolving deltas: 100% (54/54), done.

And then switch to my GPU machine and load the model with something like

# load_evo_gpu.py
from transformers import AutoConfig, AutoModelForCausalLM
import torch

if torch.cuda.is_available():
    print('Connected to a GPU\n')
else:
    print('Not connected to a GPU\n')

from psutil import virtual_memory
ram_gb = virtual_memory().total / 1e9
print('Your runtime has {:.1f} gigabytes of available RAM\n'.format(ram_gb))

model_name = 'evo-1-131k-base'

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    local_files_only=True,
)

but this time it fails miserably and spits this error:

Loading checkpoint shards:   0%|                                           | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/pasteur/zeus/projets/p01/MDM/Users/ernest/load_evo_gpu.py", line 15, in <module>
    model = AutoModelForCausalLM.from_pretrained(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pasteur/appa/homes/ermordre/miniconda3/envs/evo/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 556, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pasteur/appa/homes/ermordre/miniconda3/envs/evo/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3502, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pasteur/appa/homes/ermordre/miniconda3/envs/evo/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3903, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pasteur/appa/homes/ermordre/miniconda3/envs/evo/lib/python3.11/site-packages/transformers/modeling_utils.py", line 505, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

Any idea why this might be the case?

Zymrael commented 4 months ago

I think this is a transformers library issue, not related to Evo

ernestmordret commented 4 months ago

you're absolutely right! I finally managed to fix it by running the following script AFTER running git clone git@hf.co:togethercomputer/evo-1-131k-base

from transformers import AutoConfig, AutoModelForCausalLM

model_name = 'togethercomputer/evo-1-8k-base'

offline = False
use_cache = True
local_files_only = False
model_config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, offline=offline, use_cache=use_cache, local_files_only=local_files_only)
model_config.use_cache = True

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    config=model_config,
    trust_remote_code=True,
    offline=offline,
    use_cache=use_cache, 
    local_files_only=local_files_only
)

model.save_pretrained('evo-1-8k-base')

Now it seems to work, looking forward to experiment with it!

Zymrael commented 4 months ago

Great!

evo-design / evo

Running the model in a firewalled environment #26