AnswerDotAI / claudette

Claudette is Claude's friend
https://claudette.answer.ai/
Apache License 2.0
138 stars 22 forks source link

Is there a way to have a tool return image bytes? #10

Open Taytay opened 2 months ago

Taytay commented 2 months ago

First, thanks for claudette. It's very elegant.

toolslm doesn't appear to map the "bytes" type to anything, which I guess makes sense. However, that means I'm having a hard time using the toolloop if I include my image returning function.


def get_image_of_puppy() -> bytes:
            "Returns an image of a puppy"
            image: Path = Path("puppy.jpg")
            return image.read_bytes()

tools = [get_image_of_puppy]

chat = Chat(model, tools=tools)
r = chat.toolloop("Describe the puppy image")
print(contents(r))

Error:

Traceback (most recent call last):
  File "/Users/taytay/projects/llm-browser-driver/src/llm_browser_driver/playground/./test_claude.py", line 28, in <module>
    r = chat.toolloop("Describe the puppy image")
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/claudette/toolloop.py", line 23, in toolloop
    r = self(pr, **kwargs)
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/claudette/core.py", line 200, in __call__
    if self.tools: kw['tools'] = [get_schema(o) for o in self.tools]
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/claudette/core.py", line 200, in <listcomp>
    if self.tools: kw['tools'] = [get_schema(o) for o in self.tools]
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/toolslm/funccall.py", line 43, in get_schema
    if ret.anno is not empty: desc += f'\n\nReturns:\n- type: {_types(ret.anno)[0]}'
  File "/Users/taytay/projects/llm-browser-driver/.venv/lib/python3.10/site-packages/toolslm/funccall.py", line 20, in _types
    else: return tmap[t], None
KeyError: <class 'bytes'>

Which of course comes from:

# %% ../01_funccall.ipynb 11
def _types(t:type)->tuple[str,Optional[str]]:
    "Tuple of json schema type name and (if appropriate) array item name."
    if t is empty: raise TypeError('Missing type')
    tmap = {int:"integer", float:"number", str:"string", bool:"boolean", list:"array", dict:"object"}
    if getattr(t, '__origin__', None) in  (list,tuple): return "array", tmap.get(t.__args__[0], "object")
    else: return tmap[t], None

I tried returning a dict of the image type, but claude complains there were too many input tokens. I need to get the image to get sent as a group of bytes back into the chat so its machinery kicks in and converts it to an image that Claude understands:

def img_msg(data:bytes)->dict:
    "Convert image `data` into an encoded `dict`"
    img = base64.b64encode(data).decode("utf-8")
    mtype = mimetypes.types_map['.'+imghdr.what(None, h=data)]
    r = dict(type="base64", media_type=mtype, data=img)
    return {"type": "image", "source": r}

I could probably hack it together, but I keep feeling like I'm missing something obvious?

jph00 commented 2 months ago

Are there any examples in the Anthropic docs of a tool returning something that's not a string? If so, please provide a link and we'll try to get it working. Or alternatively, if you're able to show an example that works with the plain Anthropic class, link to a gist or repo so we can see what's needed.

Taytay commented 2 months ago

I think this is the example from their docs that covers returning an image: https://docs.anthropic.com/en/docs/build-with-claude/tool-use#example-of-tool-result-with-images

Excerpt


  "role": "user",
  "content": [
    {
      "type": "tool_result",
      "tool_use_id": "toolu_01A09q90qw90lq917835lq9",
      "content": [
        {"type": "text", "text": "15 degrees"},
        {
          "type": "image",
          "source": {
            "type": "base64",
            "media_type": "image/jpeg",
            "data": "/9j/4AAQSkZJRg...",
          }
        }
      ]
    }
  ]
}

I'll try to hack on this more too when I get a chance and will let you know if I find a good way.

Taytay commented 2 months ago

Okay - got it working. (Thanks to @patch for making this easy!)

Feel free to make this more elegant! :)


import inspect
import logging
import os

os.environ["ANTHROPIC_LOG"] = "debug"

from pathlib import Path
from typing import Optional

import claudette.core
import toolslm.funccall
from claudette import Chat, contents
from claudette.core import ToolUseBlock, _mk_ns, abc, img_msg
from fastcore.utils import patch_to

empty = inspect.Parameter.empty

@patch_to(toolslm.funccall)
def _types(t: type) -> tuple[str, Optional[str]]:
    "Tuple of json schema type name and (if appropriate) array item name."
    if t is empty:
        raise TypeError("Missing type")
    tmap = {
        int: "integer",
        float: "number",
        str: "string",
        bool: "boolean",
        list: "array",
        dict: "object",
        # Bytes is assumed to be an image for now
        # We could likely add a better type to indicate this
        bytes: {
            "type": "object",
            "properties": {
                "type": {"type": "string", "enum": ["image"]},
                "source": {
                    "type": "object",
                    "properties": {
                        "type": {"type": "string", "enum": ["base64"]},
                        "media_type": {"type": "string"},
                        "data": {"type": "string"},
                    },
                    "required": ["type", "media_type", "data"],
                },
            },
        },
    }
    if getattr(t, "__origin__", None) in (list, tuple):
        return "array", tmap.get(t.__args__[0], "object")
    else:
        return tmap[t], None

@patch_to(claudette.core)
def call_func(fc: ToolUseBlock, ns: Optional[abc.Mapping] = None, obj: Optional = None):
    "Call the function in the tool response `tr`, using namespace `ns`."
    if ns is None:
        ns = globals()
    if not isinstance(ns, abc.Mapping):
        ns = _mk_ns(*ns)
    func = getattr(obj, fc.name, None)
    if not func:
        func = ns[fc.name]
    res = func(**fc.input)
    if isinstance(res, bytes):
        # If the result is bytes, assume it's an image
        return dict(type="tool_result", tool_use_id=fc.id, content=[img_msg(res)])
    return dict(type="tool_result", tool_use_id=fc.id, content=str(res))

def get_image_of_puppy() -> bytes:
    "Returns an image of a puppy"
    image: Path = Path("samples/puppy.jpeg")
    return image.read_bytes()

def get_object_and_properties() -> dict:
    "Returns a dict with a couple of integer properties called x and y"
    return {"x": 1, "y": 2}

def get_str() -> str:
    "Returns a random string"
    return "foo!"

if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    tools = [get_image_of_puppy, get_object_and_properties, get_str]

    chat = Chat("claude-3-5-sonnet-20240620", tools=tools)
    r = chat.toolloop(
        "Tell me what tools you have access to please, and what you expect each of them to return to you. Then, examine and describe the puppy"
    )
    print(contents(r))

Full gist of response here: https://gist.github.com/Taytay/7191d5f5722d3ed8c000a938e11b26cd