Closed joris-sense closed 2 months ago
I tracked down that this issue is due to swapping the order of images and text arguments in the Idefics3 processor, see here. Swapping them back in the code makes it work for me, at least when passing a list of prompts and a list of lists of images.
Hello,
Kind of related to this issue I believe, there seems to be an issue with Idefics3ForConditionalGeneration.
The image given as inputs return Index out of range issue:
Mode load: ` import outlines from transformers import Idefics3ForConditionalGeneration
model = outlines.models.transformers_vision( "HuggingFaceM4/Idefics3-8B-Llama3", model_class=Idefics3ForConditionalGeneration, device="cuda", )
from transformers.image_utils import load_image
description_generator = outlines.generate.text(model) description_generator( [" detailed description:"], [[load_image("./image1.jpg")]] ) `
Error: `--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[4], line 18 15 from transformers.image_utils import load_image 17 description_generator = outlines.generate.text(model) ---> 18 description_generator( 19 [" detailed description:"], 20 [[load_image("Image.JPG")]] 21 )
File ~/envs/default/lib/python3.10/site-packages/outlines/generate/api.py:556, in VisionSequenceGeneratorAdapter.call(self, prompts, media, max_tokens, stop_at, seed, model_specific_params) 550 prompts, media = self._validate_prompt_media_types(prompts, media) 552 generation_params = self.prepare_generation_parameters( 553 max_tokens, stop_at, seed 554 ) --> 556 completions = self.model.generate( 557 prompts, 558 media, 559 generation_params, 560 copy(self.logits_processor), 561 self.sampling_params, 562 model_specific_params, 563 ) 565 return self._format(completions)
File ~/envs/default/lib/python3.10/site-packages/outlines/models/transformers_vision.py:46, in TransformersVision.generate(self, prompts, media, generation_parameters, logits_processor, sampling_parameters)
15 def generate( # type: ignore
16 self,
17 prompts: Union[str, List[str]],
(...)
21 sampling_parameters: SamplingParameters,
22 ) -> Union[str, List[str], List[List[str]]]:
23 """Generate text using transformers
.
24
25 Arguments
(...)
44 The generated text
45 """
---> 46 inputs = self.processor(
47 text=prompts, images=media, padding=True, return_tensors="pt"
48 ).to(self.model.device)
50 generation_kwargs = self._get_generation_kwargs(
51 prompts,
52 generation_parameters,
53 logits_processor,
54 sampling_parameters,
55 )
56 generated_ids = self._generate_output_seq(prompts, inputs, **generation_kwargs)
File ~/envs/default/lib/python3.10/site-packages/transformers/models/idefics3/processing_idefics3.py:302, in Idefics3Processor.call(self, images, text, audio, videos, image_seq_len, kwargs) 300 sample = split_sample[0] 301 for i, image_prompt_string in enumerate(image_prompt_strings): --> 302 sample += image_prompt_string + split_sample[i + 1] 303 prompt_strings.append(sample) 305 text_inputs = self.tokenizer(text=prompt_strings, output_kwargs["text_kwargs"])
IndexError: list index out of range`
Describe the issue as clearly as possible:
I am trying to run this together with the new Idefics 3 vision language model and am having trouble with that. Crossposting this as a comment in the Idefics3 PR here because I am not sure who is supposed to change the code.
Roughly following the instructions in the doc here (using
load_image
rather than theimage_from_url
defined there), I am trying the code below and get the error below.With some print statements, I found that the "<" in "Got <." in the error message is actually the first character of the prompt " detailed description:", which means that each character in the prompt list gets misinterpreted as an image URL.
Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
master branch, github PR for idefics3 https://github.com/huggingface/transformers/pull/32473
root@C.12143599:/$ python -c "from outlines import _version; print(_version.version)" 0.0.47.dev62+g900762b root@C.12143599:/$ python -c "import sys; print('Python', sys.version)" Python 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0] root@C.12143599:/$ pip freeze accelerate==0.33.0 aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 anyio==4.3.0 archspec @ file:///croot/archspec_1697725767277/work argon2-cffi==23.1.0 argon2-cffi-bindings==21.2.0 arrow==1.3.0 asttokens @ file:///opt/conda/conda-bld/asttokens_1646925590279/work astunparse==1.6.3 async-lru==2.0.4 async-timeout==4.0.3 attrs @ file:///croot/attrs_1695717823297/work Babel==2.14.0 bash_kernel==0.9.3 beautifulsoup4 @ file:///croot/beautifulsoup4-split_1681493039619/work bleach==6.1.0 boltons @ file:///croot/boltons_1677628692245/work Brotli @ file:///tmp/abs_ecyw11_7ze/croots/recipe/brotli-split_1659616059936/work certifi @ file:///croot/certifi_1707229174982/work/certifi cffi @ file:///croot/cffi_1700254295673/work chardet @ file:///home/builder/ci_310/chardet_1640804867535/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work click @ file:///croot/click_1698129812380/work cloudpickle==3.0.0 comm==0.2.1 conda @ file:///croot/conda_1696257509808/work conda-build @ file:///croot/conda-build_1708025865815/work conda-content-trust @ file:///croot/conda-content-trust_1693490622020/work conda-libmamba-solver @ file:///croot/conda-libmamba-solver_1691418897561/work/src conda-package-handling @ file:///croot/conda-package-handling_1690999929514/work conda_index @ file:///croot/conda-index_1706633791028/work conda_package_streaming @ file:///croot/conda-package-streaming_1690987966409/work cryptography @ file:///croot/cryptography_1707523700518/work datasets==2.21.0 debugpy==1.8.1 decorator @ file:///opt/conda/conda-bld/decorator_1643638310831/work defusedxml==0.7.1 dill==0.3.8 diskcache==5.6.3 distro @ file:///croot/distro_1701455004953/work dnspython==2.6.1 exceptiongroup @ file:///croot/exceptiongroup_1706031385326/work executing @ file:///opt/conda/conda-bld/executing_1646925071911/work expecttest==0.2.1 fastjsonschema==2.19.1 filelock @ file:///croot/filelock_1700591183607/work fqdn==1.5.1 frozenlist==1.4.1 fsspec==2024.2.0 gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645455533097/work h11==0.14.0 httpcore==1.0.4 httpx==0.27.0 huggingface-hub==0.24.6 hypothesis==6.98.10 idna @ file:///croot/idna_1666125576474/work iniconfig==2.0.0 interegular==0.3.3 ipykernel==6.29.2 ipython @ file:///croot/ipython_1704833016303/work ipywidgets==8.1.2 isoduration==20.11.0 jedi @ file:///tmp/build/80754af9/jedi_1644315229345/work Jinja2 @ file:///croot/jinja2_1706733616596/work json5==0.9.17 jsonpatch @ file:///tmp/build/80754af9/jsonpatch_1615747632069/work jsonpointer==2.1 jsonschema @ file:///croot/jsonschema_1699041609003/work jsonschema-specifications @ file:///croot/jsonschema-specifications_1699032386549/work jupyter==1.0.0 jupyter-archive==3.4.0 jupyter-console==6.6.3 jupyter-events==0.9.0 jupyter-http-over-ws==0.0.8 jupyter-lsp==2.2.2 jupyter_client==8.6.0 jupyter_core==5.7.1 jupyter_server==2.12.5 jupyter_server_terminals==0.5.2 jupyterlab==4.1.2 jupyterlab_pygments==0.3.0 jupyterlab_server==2.25.3 jupyterlab_widgets==3.0.10 lark==1.2.2 libarchive-c @ file:///tmp/build/80754af9/python-libarchive-c_1617780486945/work libmambapy @ file:///croot/mamba-split_1698782620632/work/libmambapy llvmlite==0.43.0 MarkupSafe @ file:///croot/markupsafe_1704205993651/work matplotlib-inline @ file:///opt/conda/conda-bld/matplotlib-inline_1662014470464/work menuinst @ file:///croot/menuinst_1706732933928/work mistune==3.0.2 mkl-fft @ file:///croot/mkl_fft_1695058164594/work mkl-random @ file:///croot/mkl_random_1695059800811/work mkl-service==2.4.0 more-itertools @ file:///croot/more-itertools_1700662129964/work mpmath @ file:///croot/mpmath_1690848262763/work multidict==6.0.5 multiprocess==0.70.16 nbclient==0.9.0 nbconvert==7.16.1 nbformat==5.9.2 nbzip==0.1.0 nest-asyncio==1.6.0 networkx @ file:///croot/networkx_1690561992265/work notebook==7.1.0 notebook_shim==0.2.4 numba==0.60.0 numpy @ file:///croot/numpy_and_numpy_base_1704311704800/work/dist/numpy-1.26.3-cp310-cp310-linux_x86_64.whl#sha256=a281f24b826e51f1c25bdd24960ab44b4bc294c65d81560441ba7fffd8ddd2a7 optree==0.10.0 outlines @ git+https://github.com/outlines-dev/outlines.git@900762b0f240c6220549d7d8594a221e6c9845e8 overrides==7.7.0 packaging @ file:///croot/packaging_1693575174725/work pandas==2.2.2 pandocfilters==1.5.1 parso @ file:///opt/conda/conda-bld/parso_1641458642106/work pexpect @ file:///tmp/build/80754af9/pexpect_1605563209008/work pillow @ file:///croot/pillow_1707233021655/work pkginfo @ file:///croot/pkginfo_1679431160147/work platformdirs @ file:///croot/platformdirs_1692205439124/work pluggy==1.4.0 prometheus_client==0.20.0 prompt-toolkit @ file:///croot/prompt-toolkit_1704404351921/work psutil @ file:///opt/conda/conda-bld/psutil_1656431268089/work ptyprocess @ file:///tmp/build/80754af9/ptyprocess_1609355006118/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl pure-eval @ file:///opt/conda/conda-bld/pure_eval_1646925070566/work pyairports==2.1.1 pyarrow==17.0.0 pycosat @ file:///croot/pycosat_1696536503704/work pycountry==24.6.1 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pydantic==2.8.2 pydantic_core==2.20.1 Pygments @ file:///croot/pygments_1684279966437/work pyOpenSSL @ file:///croot/pyopenssl_1708380408460/work PySocks @ file:///home/builder/ci_310/pysocks_1640793678128/work pytest==8.0.1 python-dateutil==2.8.2 python-etcd==0.4.5 python-json-logger==2.0.7 pytz @ file:///croot/pytz_1695131579487/work PyYAML @ file:///croot/pyyaml_1698096049011/work pyzmq==25.1.2 qtconsole==5.5.1 QtPy==2.4.1 referencing @ file:///croot/referencing_1699012038513/work regex==2024.7.24 requests==2.32.3 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 rpds-py @ file:///croot/rpds-py_1698945930462/work ruamel.yaml @ file:///croot/ruamel.yaml_1666304550667/work ruamel.yaml.clib @ file:///croot/ruamel.yaml.clib_1666302247304/work safetensors==0.4.4 Send2Trash==1.8.2 six @ file:///tmp/build/80754af9/six_1644875935023/work sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve @ file:///croot/soupsieve_1696347547217/work stack-data @ file:///opt/conda/conda-bld/stack_data_1646927590127/work sympy @ file:///croot/sympy_1701397643339/work terminado==0.18.0 tinycss2==1.2.1 tokenizers==0.19.1 tomli @ file:///opt/conda/conda-bld/tomli_1657175507142/work toolz @ file:///croot/toolz_1667464077321/work torch==2.2.1 torchaudio==2.2.1 torchelastic==0.2.2 torchvision==0.17.1 tornado==6.4 tqdm==4.66.5 traitlets @ file:///croot/traitlets_1671143879854/work transformers @ git+https://github.com/huggingface/transformers.git@046c88ea50562bd4bdc3798c9f1e4ab84a6e4b13 triton==2.2.0 truststore @ file:///croot/truststore_1695244293384/work types-dataclasses==0.6.6 types-python-dateutil==2.8.19.20240106 typing_extensions @ file:///croot/typing_extensions_1705599297034/work tzdata==2024.1 uri-template==1.3.0 urllib3 @ file:///croot/urllib3_1707770551213/work wcwidth @ file:///Users/ktietz/demo/mc3/conda-bld/wcwidth_1629357192024/work webcolors==1.13 webencodings==0.5.1 websocket-client==1.7.0 widgetsnbextension==4.0.10 xxhash==3.5.0 yarl==1.9.4 zstandard @ file:///croot/zstandard_1677013143055/work root@C.12143599:/$
Context for the issue:
Idefics3 is one of the first VLMs based on LLaMA 3.1, which is a significant improvement over LLaMA 3.