dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
9.59k stars 493 forks source link

Running outlines with Idefics3 #1123

Closed joris-sense closed 2 months ago

joris-sense commented 2 months ago

Describe the issue as clearly as possible:

I am trying to run this together with the new Idefics 3 vision language model and am having trouble with that. Crossposting this as a comment in the Idefics3 PR here because I am not sure who is supposed to change the code.

Roughly following the instructions in the doc here (using load_image rather than the image_from_url defined there), I am trying the code below and get the error below.

With some print statements, I found that the "<" in "Got <." in the error message is actually the first character of the prompt " detailed description:", which means that each character in the prompt list gets misinterpreted as an image URL.

Steps/code to reproduce the bug:

import outlines
from transformers import Idefics3ForConditionalGeneration

model = outlines.models.transformers_vision(
    "HuggingFaceM4/Idefics3-8B-Llama3",
    model_class=Idefics3ForConditionalGeneration,
    device="cuda",
)

from transformers.image_utils import load_image

description_generator = outlines.generate.text(model)
description_generator(
    [" detailed description:"],
    [[load_image("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")]]
)

Expected result:

It should work.

Error message:

---------------------------------------------------------------------------
UnidentifiedImageError Traceback (most recent call last)
File /opt/conda/lib/python3.10/site-packages/transformers/image_utils.py:372, in load_image(image, timeout)
    371 b64 = base64.decodebytes(image.encode())
--> 372 image = PIL.Image.open(BytesIO(b64))
    373 except Exception as e:

File /opt/conda/lib/python3.10/site-packages/PIL/Image.py:3309, in open(fp, mode, formats)
   3308 msg = "cannot identify image file %r" % (filename if filename else fp)
-> 3309 raise UnidentifiedImageError(msg)

UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f707e5b7a60>

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
Cell In[2], line 13
     10 from transformers.image_utils import load_image
     12 description_generator = outlines.generate.text(model)
---> 13 description_generator(
     14 [" detailed description:"],
     15 [[load_image("https://upload.wikimedia.org/wikipedia/commons/2/25/Siam_lilacpoint.jpg")]]
     16 )

File /opt/conda/lib/python3.10/site-packages/outlines/generate/api.py:555, in VisionSequenceGeneratorAdapter.__call__(self, prompts, media, max_tokens, stop_at, seed, **model_specific_params)
    549 prompts, media = self._validate_prompt_media_types(prompts, media)
    551 generation_params = self.prepare_generation_parameters(
    552 max_tokens, stop_at, seed
    553 )
--> 555 completions = self.model.generate(
    556 prompts,
    557 media,
    558 generation_params,
    559 self.logits_processor,
    560 self.sampling_params,
    561 **model_specific_params,
    562 )
    564 return self._format(completions)

File /opt/conda/lib/python3.10/site-packages/outlines/models/transformers_vision.py:46, in TransformersVision.generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters)
     15 def generate( # type: ignore
     16 self,
     17 prompts: Union[str, List[str]],
   (...)
     21 sampling_parameters: SamplingParameters,
     22 ) -> Union[str, List[str], List[List[str]]]:
     23 """Generate text using `transformers`.
     24
     25 Arguments
   (...)
     44 The generated text
     45 """
---> 46 inputs = self.processor(prompts, media, padding=True, return_tensors="pt").to(
     47 self.model.device
     48 )
     50 generation_kwargs = self._get_generation_kwargs(
     51 prompts,
     52 generation_parameters,
     53 logits_processor,
     54 sampling_parameters,
     55 )
     56 generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)

File /opt/conda/lib/python3.10/site-packages/transformers/models/idefics3/processing_idefics3.py:288, in Idefics3Processor.__call__(self, images, text, audio, videos, **kwargs)
    286 new_images[-1].append(im) # already loaded
    287 elif isinstance(im, str):
--> 288 new_images[-1].append(load_image(im))
    290 images = new_images
    291 del new_images

File /opt/conda/lib/python3.10/site-packages/transformers/image_utils.py:374, in load_image(image, timeout)
    372 image = PIL.Image.open(BytesIO(b64))
    373 except Exception as e:
--> 374 raise ValueError(
    375 f"Incorrect image source. Must be a valid URL starting with `http://` or `https://`, a valid path to an image file, or a base64 encoded string. Got {image}. Failed with {e}"
    376 )
    377 elif isinstance(image, PIL.Image.Image):
    378 image = image

ValueError: Incorrect image source. Must be a valid URL starting with `http://` or `https://`, a valid path to an image file, or a base64 encoded string. Got <. Failed with cannot identify image file <_io.BytesIO object at 0x7f707e5b7a60>

Outlines/Python version information:

master branch, github PR for idefics3 https://github.com/huggingface/transformers/pull/32473

root@C.12143599:/$ python -c "from outlines import _version; print(_version.version)" 0.0.47.dev62+g900762b root@C.12143599:/$ python -c "import sys; print('Python', sys.version)" Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] root@C.12143599:/$ pip freeze accelerate==0.33.0 aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.3.0 archspec @ file:///croot/archspec_1697725767277/work argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work astunparse==1.6.3 async-lru==2.0.4 async-timeout==4.0.3 attrs @ file:///croot/attrs_1695717823297/work Babel==2.14.0 bash_kernel==0.9.3 beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work bleach==6.1.0 boltons @ file:///croot/boltons_1677628692245/work Brotli @ file:///tmp/abs_ecyw11_7ze/croots/recipe/brotli-split_1659616059936/work certifi @ file:///croot/certifi_1707229174982/work/certifi cffi @ file:///croot/cffi_1700254295673/work chardet @ file:///home/builder/ci_310/chardet_1640804867535/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work click @ file:///croot/click_1698129812380/work cloudpickle==3.0.0 comm==0.2.1 conda @ file:///croot/conda_1696257509808/work conda-build @ file:///croot/conda-build_1708025865815/work conda-content-trust @ file:///croot/conda-content-trust_1693490622020/work conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1691418897561/work/src conda-package-handling @ file:///croot/conda-package-handling_1690999929514/work conda_index @ file:///croot/conda-index_1706633791028/work conda_package_streaming @ file:///croot/conda-package-streaming_1690987966409/work cryptography @ file:///croot/cryptography_1707523700518/work datasets==2.21.0 debugpy==1.8.1 decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work defusedxml==0.7.1 dill==0.3.8 diskcache==5.6.3 distro @ file:///croot/distro_1701455004953/work dnspython==2.6.1 exceptiongroup @ file:///croot/exceptiongroup_1706031385326/work executing @ file:///opt/conda/conda-bld/executing_1646925071911/work expecttest==0.2.1 fastjsonschema==2.19.1 filelock @ file:///croot/filelock_1700591183607/work fqdn==1.5.1 frozenlist==1.4.1 fsspec==2024.2.0 gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 huggingface-hub==0.24.6 hypothesis==6.98.10 idna @ file:///croot/idna_1666125576474/work iniconfig==2.0.0 interegular==0.3.3 ipykernel==6.29.2 ipython @ file:///croot/ipython_1704833016303/work ipywidgets==8.1.2 isoduration==20.11.0 jedi @ file:///tmp/build/80754af9/jedi_1644315229345/work Jinja2 @ file:///croot/jinja2_1706733616596/work json5==0.9.17 jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work jsonpointer==2.1 jsonschema @ file:///croot/jsonschema_1699041609003/work jsonschema-specifications @ file:///croot/jsonschema-specifications_1699032386549/work jupyter==1.0.0 jupyter-archive==3.4.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-http-over-ws==0.0.8 jupyter-lsp==2.2.2 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.12.5 jupyter_server_terminals==0.5.2 jupyterlab==4.1.2 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.3 jupyterlab_widgets==3.0.10 lark==1.2.2 libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work libmambapy @ file:///croot/mamba-split_1698782620632/work/libmambapy llvmlite==0.43.0 MarkupSafe @ file:///croot/markupsafe_1704205993651/work matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work menuinst @ file:///croot/menuinst_1706732933928/work mistune==3.0.2 mkl-fft @ file:///croot/mkl_fft_1695058164594/work mkl-random @ file:///croot/mkl_random_1695059800811/work mkl-service==2.4.0 more-itertools @ file:///croot/more-itertools_1700662129964/work mpmath @ file:///croot/mpmath_1690848262763/work multidict==6.0.5 multiprocess==0.70.16 nbclient==0.9.0 nbconvert==7.16.1 nbformat==5.9.2 nbzip==0.1.0 nest-asyncio==1.6.0 networkx @ file:///croot/networkx_1690561992265/work notebook==7.1.0 notebook_shim==0.2.4 numba==0.60.0 numpy @ file:///croot/numpy_and_numpy_base_1704311704800/work/dist/numpy-1.26.3-cp310-cp310-linux_x86_64.whl#sha256=a281f24b826e51f1c25bdd24960ab44b4bc294c65d81560441ba7fffd8ddd2a7 optree==0.10.0 outlines @ git+https://github.com/outlines-dev/outlines.git@900762b0f240c6220549d7d8594a221e6c9845e8 overrides==7.7.0 packaging @ file:///croot/packaging_1693575174725/work pandas==2.2.2 pandocfilters==1.5.1 parso @ file:///opt/conda/conda-bld/parso_1641458642106/work pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pillow @ file:///croot/pillow_1707233021655/work pkginfo @ file:///croot/pkginfo_1679431160147/work platformdirs @ file:///croot/platformdirs_1692205439124/work pluggy==1.4.0 prometheus_client==0.20.0 prompt-toolkit @ file:///croot/prompt-toolkit_1704404351921/work psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work pyairports==2.1.1 pyarrow==17.0.0 pycosat @ file:///croot/pycosat_1696536503704/work pycountry==24.6.1 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pydantic==2.8.2 pydantic_core==2.20.1 Pygments @ file:///croot/pygments_1684279966437/work pyOpenSSL @ file:///croot/pyopenssl_1708380408460/work PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work pytest==8.0.1 python-dateutil==2.8.2 python-etcd==0.4.5 python-json-logger==2.0.7 pytz @ file:///croot/pytz_1695131579487/work PyYAML @ file:///croot/pyyaml_1698096049011/work pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 referencing @ file:///croot/referencing_1699012038513/work regex==2024.7.24 requests==2.32.3 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py @ file:///croot/rpds-py_1698945930462/work ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work safetensors==0.4.4 Send2Trash==1.8.2 six @ file:///tmp/build/80754af9/six_1644875935023/work sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve @ file:///croot/soupsieve_1696347547217/work stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work sympy @ file:///croot/sympy_1701397643339/work terminado==0.18.0 tinycss2==1.2.1 tokenizers==0.19.1 tomli @ file:///opt/conda/conda-bld/tomli_1657175507142/work toolz @ file:///croot/toolz_1667464077321/work torch==2.2.1 torchaudio==2.2.1 torchelastic==0.2.2 torchvision==0.17.1 tornado==6.4 tqdm==4.66.5 traitlets @ file:///croot/traitlets_1671143879854/work transformers @ git+https://github.com/huggingface/transformers.git@046c88ea50562bd4bdc3798c9f1e4ab84a6e4b13 triton==2.2.0 truststore @ file:///croot/truststore_1695244293384/work types-dataclasses==0.6.6 types-python-dateutil==2.8.19.20240106 typing_extensions @ file:///croot/typing_extensions_1705599297034/work tzdata==2024.1 uri-template==1.3.0 urllib3 @ file:///croot/urllib3_1707770551213/work wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 xxhash==3.5.0 yarl==1.9.4 zstandard @ file:///croot/zstandard_1677013143055/work root@C.12143599:/$

Context for the issue:

Idefics3 is one of the first VLMs based on LLaMA 3.1, which is a significant improvement over LLaMA 3.

joris-sense commented 2 months ago

I tracked down that this issue is due to swapping the order of images and text arguments in the Idefics3 processor, see here. Swapping them back in the code makes it work for me, at least when passing a list of prompts and a list of lists of images.

Glider95 commented 1 week ago

Hello,

Kind of related to this issue I believe, there seems to be an issue with Idefics3ForConditionalGeneration.

The image given as inputs return Index out of range issue:

Mode load: ` import outlines from transformers import Idefics3ForConditionalGeneration

model = outlines.models.transformers_vision( "HuggingFaceM4/Idefics3-8B-Llama3", model_class=Idefics3ForConditionalGeneration, device="cuda", )

from transformers.image_utils import load_image

description_generator = outlines.generate.text(model) description_generator( [" detailed description:"], [[load_image("./image1.jpg")]] ) `

Error: `--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[4], line 18 15 from transformers.image_utils import load_image 17 description_generator = outlines.generate.text(model) ---> 18 description_generator( 19 [" detailed description:"], 20 [[load_image("Image.JPG")]] 21 )

File ~/envs/default/lib/python3.10/site-packages/outlines/generate/api.py:556, in VisionSequenceGeneratorAdapter.call(self, prompts, media, max_tokens, stop_at, seed, model_specific_params) 550 prompts, media = self._validate_prompt_media_types(prompts, media) 552 generation_params = self.prepare_generation_parameters( 553 max_tokens, stop_at, seed 554 ) --> 556 completions = self.model.generate( 557 prompts, 558 media, 559 generation_params, 560 copy(self.logits_processor), 561 self.sampling_params, 562 model_specific_params, 563 ) 565 return self._format(completions)

File ~/envs/default/lib/python3.10/site-packages/outlines/models/transformers_vision.py:46, in TransformersVision.generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters) 15 def generate( # type: ignore 16 self, 17 prompts: Union[str, List[str]], (...) 21 sampling_parameters: SamplingParameters, 22 ) -> Union[str, List[str], List[List[str]]]: 23 """Generate text using transformers. 24 25 Arguments (...) 44 The generated text 45 """ ---> 46 inputs = self.processor( 47 text=prompts, images=media, padding=True, return_tensors="pt" 48 ).to(self.model.device) 50 generation_kwargs = self._get_generation_kwargs( 51 prompts, 52 generation_parameters, 53 logits_processor, 54 sampling_parameters, 55 ) 56 generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)

File ~/envs/default/lib/python3.10/site-packages/transformers/models/idefics3/processing_idefics3.py:302, in Idefics3Processor.call(self, images, text, audio, videos, image_seq_len, kwargs) 300 sample = split_sample[0] 301 for i, image_prompt_string in enumerate(image_prompt_strings): --> 302 sample += image_prompt_string + split_sample[i + 1] 303 prompt_strings.append(sample) 305 text_inputs = self.tokenizer(text=prompt_strings, output_kwargs["text_kwargs"])

IndexError: list index out of range`