foundation-model-stack / fms-hf-tuning

🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.
Apache License 2.0
28 stars 48 forks source link

deps: Add protobuf to support ALLaM models #328

Closed willmj closed 2 months ago

willmj commented 2 months ago

Description of the change

Add protobuf v5.28.0 to fms-hf-tuning for compatibility with certain models

Related issue number

How to verify the PR

Was the PR tested

anhuong commented 2 months ago

Note the error occurs when loading the tokenizer for ALLaM model without protobuf:

ERROR:sft_trainer.py:Traceback (most recent call last):
  File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 577, in main
    trainer = train(
              ^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/tuning/sft_trainer.py", line 195, in train
    tokenizer = AutoTokenizer.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 916, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 157, in __init__
    super().__init__(
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 118, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py", line 1597, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/convert_slow_tokenizer.py", line 538, in __init__
    requires_backends(self, "protobuf")
  File "/home/tuning/.local/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1531, in requires_backends
    raise ImportError("".join(failed))
ImportError: 
LlamaConverter requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.
anhuong commented 2 months ago

Squashing the commit here. To add this change to main branch, would cherry-pick in the single squashed commit with this change.