TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models
https://transformerlensorg.github.io/TransformerLens/
MIT License
1.39k stars 269 forks source link

[Question] Offline Error HookedTransformer.from_pretrained #664

Closed pbernabeup closed 1 month ago

pbernabeup commented 1 month ago

Question

I am attempting to run a model in an offline environment using the following code:

import os
import transformers
from transformer_lens import HookedTransformer

base_path = "."

tinystories = transformers.AutoModelForCausalLM.from_pretrained(
    os.path.join(base_path, "models/TinyStories-1Layer-21M"),
    local_files_only=True,
    )

hook_trf = HookedTransformer.from_pretrained(
    model_name=os.path.join(base_path, "models/TinyStories-1Layer-21M"),
    model_from_pretrained_kwargs={"hf_model": tinystories},
)

If I set model_name to the path of the local clone of the repo, it attempts to convert it to a valid official name and throws this error:

Traceback (most recent call last):
  File "/gpfs/projects/bsc70/hpai/storage/data/mechanistic-interpretability/cache_activations.py", line 14, in <module>
    hook_trf = HookedTransformer.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformer_lens/HookedTransformer.py", line 1238, in from_pretrained
    official_model_name = loading.get_official_model_name(model_name)
  File "/usr/local/lib/python3.10/dist-packages/transformer_lens/loading_from_pretrained.py", line 689, in get_official_model_name
    raise ValueError(
ValueError: ./models/TinyStories-1Layer-21M not found. Valid official model names (excl aliases): [...]

However, if I set model_name to the name of the HF repo the model tries to access HuggingFace and throws the following error:

Traceback (most recent call last):
    hook_trf = HookedTransformer.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformer_lens/HookedTransformer.py", line 1243, in from_pretrained
    cfg = loading.get_pretrained_model_config(
  File "/usr/local/lib/python3.10/dist-packages/transformer_lens/loading_from_pretrained.py", line 1414, in get_pretrained_model_config
    cfg_dict = convert_hf_model_config(official_model_name, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformer_lens/loading_from_pretrained.py", line 718, in convert_hf_model_config
    hf_config = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 934, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/configuration_utils.py", line 632, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/python3.10/site-packages/transformers/utils/hub.py", line 442, in cached_file
    raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like roneneldan/TinyStories-1Layer-21M is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

How can I fix this?

Yang-bug-star commented 1 month ago

Have you solved it ?

pbernabeup commented 1 month ago

I have managed to find a workaround for this issue.

If your local path coincides with the name of the official model, the HookedTransformer.from_pretrained() detects it correctly and loads it. So in this case I solved it using the following code:

hook_trf = HookedTransformer.from_pretrained(
    model_name="roneneldan/TinyStories-1Layer-21M",
    local_files_only=True,
)

Where "roneneldan/TinyStories-1Layer-21M" is both the path of the model in my system and in HuggingFace.

Additionally, if you intend to use a fine-tuned version of a supported model, you can rename the local folder as the model's official name and it will load correctly. For example, I downloaded Aloe, a medical fine-tune of Llama3, and changed the folder names from "HPAI-BSC/Llama3-Aloe-8B-Alpha" to "meta-llama/Meta-Llama-3-8B" and loaded it using this code:

hook_trf = HookedTransformer.from_pretrained(
    model_name="meta-llama/Meta-Llama-3-8B",
    local_files_only=True,
)

This can also be done with git clone <finetune_hf_repo_name> <supported_hf_repo_name>.

bryce13950 commented 1 month ago

When dealing with TransformerLens, it's best to keep the model names the official names used on hugging face. What I gather from your code is that you have a local repo where you reorganize your models? If I am wrong on that let me know. If that is the case, a little info... TransformerLens takes in what you pass through as the model name to load in a bunch of different things needed to run a model within TransformerLens. Most of the loading of the actual model is offloaded to transformers. Before we get to that point, TransformerLens will try to guess what the official name of the model is. It does that by checking a map of aliases to see if a specific alias has been configured, which means that if you have a custom path that differs from the official name, you are greatly lowering the possibility of it figuring out what the actual model is, since someone would have had to previously configure the model name as you have it to point to the official model name. For all of these reasons, it is almost always best to load the model from the official name, and to keep things as is. If you want to add an alias, I can show you how to do that, but I am going to close this unless you want to do that.