dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.45k stars 480 forks source link

Modal example image build fails #1098

Open psimm opened 3 months ago

psimm commented 3 months ago

Describe the issue as clearly as possible:

When trying to run the example code here: https://outlines-dev.github.io/outlines/cookbook/deploy-using-modal/

on Modal, I run into an error in outlines_image.run_function(import_model).

The HF_TOKEN variable is correctly set.

Error:

Stopping app - uncaught exception raised locally: RemoteError("Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:\nException('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')").

Also, the text says "We download the Mistral-7B-v0.1 model from HuggingFace" but the code uses "mistralai/Mistral-7B-Instruct-v0.2". I tried it with both models and ran into the same error.

Steps/code to reproduce the bug:

from modal import Image, App, gpu

app = App(name="outlines-app")

outlines_image = Image.debian_slim(python_version="3.11").pip_install(
    "outlines==0.0.37",
    "transformers==4.38.2",
    "datasets==2.18.0",
    "accelerate==0.27.2",
)

def import_model():
    import os

    os.environ["HF_TOKEN"] = "YOUR_HUGGINGFACE_TOKEN"
    import outlines

    outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")

Expected result:

Expected to correctly load the model and save it in the Modal image.

Error message:

Building image im-MiWdGqu5JS8eOH9zxxNH4Z

=> Step 0: running function 'import_model'
/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/usr/local/lib/python3.11/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Downloading shards: 100%|██████████| 3/3 [00:56<00:00, 18.78s/it]
Loading checkpoint shards: 100%|██████████| 3/3 [00:17<00:00,  5.68s/it]
Traceback (most recent call last):
  File "/pkg/modal/_container_io_manager.py", line 629, in handle_input_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 383, in run_input_sync
    res = io_context.call_finalized_function()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/pkg/modal/_container_io_manager.py", line 148, in call_finalized_function
    res = self.finalized_function.callable(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/main.py", line 19, in import_model
    outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2")
  File "/usr/local/lib/python3.11/site-packages/outlines/models/transformers.py", line 222, in transformers
    tokenizer = AutoTokenizer.from_pretrained(model_name, **tokenizer_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 825, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py", line 133, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 111, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3
Saving image...
Image saved, took 6.03s
Finished image build for im-MiWdGqu5JS8eOH9zxxNH4Z
⠴ Creating objects...
├── 🔨 Created mount /Users/paulsimmering/Documents/Py.nosync/constrained-labeling/main.py
└── 🔨 Created function import_model.[Thread-1 (thread_inner)] 2024-08-14T13:14:02+0200 Exception when resolving Image()
Traceback (most recent call last):
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 121, in loader
    await obj._load(obj, self, existing_object_id)
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/image.py", line 445, in _load
    raise RemoteError(f"Image build for {image_id} failed with the exception:\n{result.exception}")
modal.exception.RemoteError: Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:
Exception('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')
[Thread-1 (thread_inner)] 2024-08-14T13:14:02+0200 Exception when resolving Image()
Traceback (most recent call last):
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
modal.exception.RemoteError: Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:
Exception('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')
[Thread-1 (thread_inner)] 2024-08-14T13:14:02+0200 Exception when resolving Image()
Traceback (most recent call last):
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 117, in loader
    await TaskContext.gather(*[self.load(dep) for dep in obj.deps()])
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_utils/async_utils.py", line 229, in gather
    results = await asyncio.gather(*(tc.create_task(coro) for coro in coros))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
modal.exception.RemoteError: Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:
Exception('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')
[Thread-1 (thread_inner)] 2024-08-14T13:14:02+0200 Exception when resolving Function(import_model)
Traceback (most recent call last):
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 117, in loader
    await TaskContext.gather(*[self.load(dep) for dep in obj.deps()])
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_utils/async_utils.py", line 229, in gather
    results = await asyncio.gather(*(tc.create_task(coro) for coro in coros))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
modal.exception.RemoteError: Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:
Exception('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 117, in loader
    await TaskContext.gather(*[self.load(dep) for dep in obj.deps()])
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_utils/async_utils.py", line 228, in gather
    async with TaskContext() as tc:
asyncio.exceptions.CancelledError
[Thread-1 (thread_inner)] 2024-08-14T13:14:02+0200 Exception when resolving Function(generate)
Traceback (most recent call last):
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 117, in loader
    await TaskContext.gather(*[self.load(dep) for dep in obj.deps()])
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_utils/async_utils.py", line 229, in gather
    results = await asyncio.gather(*(tc.create_task(coro) for coro in coros))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 117, in loader
    await TaskContext.gather(*[self.load(dep) for dep in obj.deps()])
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_utils/async_utils.py", line 229, in gather
    results = await asyncio.gather(*(tc.create_task(coro) for coro in coros))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/paulsimmering/Documents/Py.nosync/constrained-labeling/.venv/lib/python3.11/site-packages/modal/_resolver.py", line 142, in load
    return await cached_future
           ^^^^^^^^^^^^^^^^^^^
Stopping app - uncaught exception raised locally: RemoteError("Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:\nException('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')").
╭─ Error ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Image build for im-MiWdGqu5JS8eOH9zxxNH4Z failed with the exception:                                                                                                  │
│ Exception('data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 40 column 3')

Outlines/Python version information:

outlines_image = Image.debian_slim(python_version="3.11").pip_install( "outlines==0.0.37", "transformers==4.38.2", "datasets==2.18.0", "accelerate==0.27.2", )

Context for the issue:

No response

lapp0 commented 3 months ago

Can you try pip install transformers outlines datasets accelerate -U and report back whether that resolves your issue? You're on an old version of transformers and outlines.

psimm commented 3 months ago

@lapp0 Thanks, I reran with the latest versions of the libraries and it worked. As this runs in Modal I've specified them by version number instead of the pip install command you wrote.

I used "mistralai/Mistral-7B-v0.1". throughout, which worked. Currently the text and the inference code of the cookbook say to use that model but the import_model function uses "mistralai/Mistral-7B-Instruct-v0.2". This causes a mismatch.

Could you please update the cookbook example to use the same model throughout and use updated package versions?

outlines_image = Image.debian_slim(python_version="3.11").pip_install(
    "outlines==0.0.46",
    "transformers==4.44.0",
    "datasets==2.21.0",
    "accelerate==0.33.0",
)