RedHatQuickCourses / rhods-intro

Introduction to Red Hat OpenShift AI (RHOAI)
https://redhatquickcourses.github.io/rhods-intro/
13 stars 14 forks source link

Text Generation Demo Test Errors With Downloaded Model #35

Closed jdandrea closed 10 months ago

jdandrea commented 10 months ago

Using an OpenShift AI Sandbox cluster with Chapter 1 Demo Lab, Step 5, with the cloned repo, a new PyTorch workbench/storage, and a downloaded model (into my_model), the "Run the Tests" step results in a few ValueError and Traceback responses.

generator = pipeline("text-generation", model="./my_model", tokenizer=tokenizer)
print(generator(prompt)[0]["generated_text"].strip())
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[17], line 1
----> 1 generator = pipeline("text-generation", model="./my_model", tokenizer=tokenizer)
      2 print(generator(prompt)[0]["generated_text"].strip())

File /opt/app-root/lib64/python3.9/site-packages/transformers/pipelines/__init__.py:834, in pipeline(task, model, config, tokenizer, feature_extractor, image_processor, framework, revision, use_fast, token, device, device_map, torch_dtype, trust_remote_code, model_kwargs, pipeline_class, **kwargs)
    832 if isinstance(model, str) or framework is None:
    833     model_classes = {"tf": targeted_task["tf"], "pt": targeted_task["pt"]}
--> 834     framework, model = infer_framework_load_model(
    835         model,
    836         model_classes=model_classes,
    837         config=config,
    838         framework=framework,
    839         task=task,
    840         **hub_kwargs,
    841         **model_kwargs,
    842     )
    844 model_config = model.config
    845 hub_kwargs["_commit_hash"] = model.config._commit_hash

File /opt/app-root/lib64/python3.9/site-packages/transformers/pipelines/base.py:282, in infer_framework_load_model(model, config, model_classes, task, framework, **model_kwargs)
    280         for class_name, trace in all_traceback.items():
    281             error += f"while loading with {class_name}, an error is thrown:\n{trace}\n"
--> 282         raise ValueError(
    283             f"Could not load model {model} with any of the following classes: {class_tuple}. See the original errors:\n\n{error}\n"
    284         )
    286 if framework is None:
    287     framework = infer_framework(model.__class__)

ValueError: Could not load model ./my_model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.gpt2.modeling_gpt2.GPT2LMHeadModel'>). See the original errors:

while loading with AutoModelForCausalLM, an error is thrown:
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 484, in load_state_dict
    return torch.load(checkpoint_file, map_location=map_location)
  File "/opt/app-root/lib64/python3.9/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/opt/app-root/lib64/python3.9/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xe6'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 488, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib64/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/pipelines/base.py", line 269, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 565, in from_pretrained
    return model_class.from_pretrained(
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 3019, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 500, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for './my_model/pytorch_model.bin' at './my_model/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

while loading with GPT2LMHeadModel, an error is thrown:
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 484, in load_state_dict
    return torch.load(checkpoint_file, map_location=map_location)
  File "/opt/app-root/lib64/python3.9/site-packages/torch/serialization.py", line 815, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/opt/app-root/lib64/python3.9/site-packages/torch/serialization.py", line 1033, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '\xe6'.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 488, in load_state_dict
    if f.read(7) == "version":
  File "/usr/lib64/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/pipelines/base.py", line 269, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 3019, in from_pretrained
    state_dict = load_state_dict(resolved_archive_file)
  File "/opt/app-root/lib64/python3.9/site-packages/transformers/modeling_utils.py", line 500, in load_state_dict
    raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for './my_model/pytorch_model.bin' at './my_model/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
jramcast commented 10 months ago

Hi @jdandrea , thanks for the feedback.

Can you verify that the pytorch_model.bin is not corrupt? You might want to re-download the file to your computer and then upload it again to the workbench.

If the download/upload process is correct, you should see something like this when you run the ls command in the notebook (pay attention to the file size) image

You might also verify the sha1sum of the file. If the file is not corrupted, then the sha1sum of the file should be 73ecab61b4c09b814e097d85440a942836b3713a. You can add a cell to perform this verification:

image

Also, what's the PyTorch workbench version that you are using? The exercise was developed for the PyTorch 2023.1 workbench. This workbench includes PyTorch v1.13. I noticed, however, that the PyTorch 2023.2 worbench is available now, including PyTorch v2.0.0, but I just tested and the notebook also works fine with this version

jdandrea commented 10 months ago

Hi @jramcast - thanks for those pointers!

Pilot error on my part. I must not have allowed the file to complete uploading. Beg pardon!