geekan / MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
https://deepwisdom.ai/
MIT License
44.42k stars 5.3k forks source link

Unable to run llm_vision.py example / pass any images to Action._aask() #1461

Closed tanishq-acu closed 1 month ago

tanishq-acu commented 1 month ago

Running the llm_vision.py example or trying to create any Roles with actions that call Action._aask() with an image in base64 results in an error, likely from how the OpenAI api is called or maybe something with my configuration?

Here is the specific error:

Exception: Traceback (most recent call last):
  File "/Users/tanishq/Documents/MetaGPT/MetaGPT/metagpt/utils/common.py", line 640, in wrapper
    return await func(self, *args, **kwargs)
  File "/Users/tanishq/Documents/MetaGPT/MetaGPT/metagpt/roles/role.py", line 550, in run
    rsp = await self.react()
  File "/Users/tanishq/Documents/MetaGPT/MetaGPT/metagpt/roles/role.py", line 519, in react
    rsp = await self._react()
  File "/Users/tanishq/Documents/MetaGPT/MetaGPT/metagpt/roles/role.py", line 474, in _react
    rsp = await self._act()
  File "/Users/tanishq/Documents/MetaGPT/WebSummarizer/WebScraper.py", line 263, in _act
    result= await (AnswerQuestion().run(msg.content, self.image_base64))
  File "/Users/tanishq/Documents/MetaGPT/WebSummarizer/WebScraper.py", line 43, in run
    result = await self._aask(query, image_id)
  File "/Users/tanishq/Documents/MetaGPT/WebSummarizer/WebScraper.py", line 67, in _aask
    rsp = await self.llm.acompletion_text(message, stream=stream, timeout = self.llm.get_timeout(USE_CONFIG_TIMEOUT))
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/tenacity/_asyncio.py", line 88, in async_wrapped
    return await fn(*args, **kwargs)
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/tenacity/_asyncio.py", line 47, in __call__
    do = self.iter(retry_state=retry_state)
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/tenacity/__init__.py", line 314, in iter
    return fut.result()
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/concurrent/futures/_base.py", line 451, in result
    return self.__get_result()
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
    raise self._exception
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/tenacity/_asyncio.py", line 50, in __call__
    result = await fn(*args, **kwargs)
  File "/Users/tanishq/Documents/MetaGPT/MetaGPT/metagpt/provider/openai_api.py", line 158, in acompletion_text
    return await self._achat_completion_stream(messages, timeout=timeout)
  File "/Users/tanishq/Documents/MetaGPT/MetaGPT/metagpt/provider/openai_api.py", line 90, in _achat_completion_stream
    response: AsyncStream[ChatCompletionChunk] = await self.aclient.chat.completions.create(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/resources/chat/completions.py", line 1295, in create
    return await self._post(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1536, in post
    return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1315, in request
    return await self._request(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1378, in _request
    return await self._retry_request(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1418, in _retry_request
    return await self._request(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1378, in _request
    return await self._retry_request(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1418, in _retry_request
    return await self._request(
  File "/Users/tanishq/.pyenv/versions/3.10.11/lib/python3.10/site-packages/openai/_base_client.py", line 1392, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.InternalServerError: unable to extract input for provider 'openai-api' for pipeline 'text/completions': unable to run extractor 'text/completions': unable to map lua output to Response: 2 error(s) decoding:

* 'Data[0]' expected type 'uint8', got unconvertible type 'map[interface {}]interface {}', value: 'map[text:What is in the image? type:text]'
* 'Data[1]' expected type 'uint8', got unconvertible type 'map[interface {}]interface {}', value: 'map[image_url:map[url:data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAASABIAAD/4QBMRXhpZgAATU0AKgAAAAgAAYdpAAQAAAABAAAAGgAAAAAAA6ABAAMAAAABAAEAAKACAAQAAAABAAAAZKADAAQAAAABAAAAZAAAAAD/wAARCABkAGQDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL. . . ]

I believe the message/request is being assembled incorrectly somehow for the particular case of passing in an image alongside a text query to the llm provider.

This is with: Python 3.10.11 OpenAI gpt-4o/gpt-4o-mini Latest version of MetaGPT

iorisa commented 1 month ago
截屏2024-08-23 18 49 01

python 3.10.14 OpenAI gpt-4o Last version of MetaGPT

I guess url:data:image/jpeg;base64, cause the problem.