Mirascope / mirascope

LLM abstractions that aren't obstructions
https://docs.mirascope.io/
MIT License
684 stars 39 forks source link

Pydantic validation error when using dicts content in user message #341

Closed off6atomic closed 1 month ago

off6atomic commented 3 months ago

Description

I ran the following code:

import pprint

from mirascope.openai import OpenAICall
from openai.types.chat import ChatCompletionMessageParam

class Librarian(OpenAICall):
    prompt_template = """
    SYSTEM: You are the world's greatest librarian.
    MESSAGES: {history}
    """

    history: list[ChatCompletionMessageParam] = []

history = [
    {
        "role": "user",
        "content": [{"type": "text", "text": "What fantasy book should I read?"}],
    },
]
librarian = Librarian(history=history)
pprint.pprint(librarian.messages(), indent=2)

And got this message:

[ {'content': "You are the world's greatest librarian.", 'role': 'system'},
  { 'content': ValidatorIterator(index=0, schema=Some(Union(UnionValidator { mode: Smart, choices: [(TypedDict(TypedDictValidator { fields: [TypedDictField { name: "text", lookup_key: Simple { key: "text", py_key: Py(0x117aa4fb0), path: LookupPath([S("text", Py(0x1179e4af0))]) }, name_py: Py(0x104520b70), required: true, validator: Str(StrValidator { strict: false, coerce_numbers_to_str: false }) }, TypedDictField { name: "type", lookup_key: Simple { key: "type", py_key: Py(0x1179e4db0), path: LookupPath([S("type", Py(0x1179e52f0))]) }, name_py: Py(0x104500d70), required: true, validator: Literal(LiteralValidator { lookup: LiteralLookup { expected_bool: None, expected_int: None, expected_str: Some({"text": 0}), expected_py_dict: None, expected_py_list: None, values: [Py(0x104520b70)] }, expected_repr: "'text'", name: "literal['text']" }) }], extra_behavior: Ignore, extras_validator: None, strict: false, loc_by_alias: true }), None), (TypedDict(TypedDictValidator { fields: [TypedDictField { name: "image_url", lookup_key: Simple { key: "image_url", py_key: Py(0x117ac8b70), path: LookupPath([S("image_url", Py(0x117ac8bb0))]) }, name_py: Py(0x1168f7db0), required: true, validator: TypedDict(TypedDictValidator { fields: [TypedDictField { name: "url", lookup_key: Simple { key: "url", py_key: Py(0x1179f44f0), path: LookupPath([S("url", Py(0x117ac8ab0))]) }, name_py: Py(0x104728970), required: true, validator: Str(StrValidator { strict: false, coerce_numbers_to_str: false }) }, TypedDictField { name: "detail", lookup_key: Simple { key: "detail", py_key: Py(0x117ac8af0), path: LookupPath([S("detail", Py(0x117ac8b30))]) }, name_py: Py(0x107828530), required: false, validator: Literal(LiteralValidator { lookup: LiteralLookup { expected_bool: None, expected_int: None, expected_str: Some({"high": 2, "auto": 0, "low": 1}), expected_py_dict: None, expected_py_list: None, values: [Py(0x1046a95f0), Py(0x1049f0f70), Py(0x1049f0fb0)] }, expected_repr: "'auto', 'low' or 'high'", name: "literal['auto','low','high']" }) }], extra_behavior: Ignore, extras_validator: None, strict: false, loc_by_alias: true }) }, TypedDictField { name: "type", lookup_key: Simple { key: "type", py_key: Py(0x117ac8bf0), path: LookupPath([S("type", Py(0x117ac8c30))]) }, name_py: Py(0x104500d70), required: true, validator: Literal(LiteralValidator { lookup: LiteralLookup { expected_bool: None, expected_int: None, expected_str: Some({"image_url": 0}), expected_py_dict: None, expected_py_list: None, values: [Py(0x1168f7db0)] }, expected_repr: "'image_url'", name: "literal['image_url']" }) }], extra_behavior: Ignore, extras_validator: None, strict: false, loc_by_alias: true }), None)], custom_error: None, strict: false, name: "union[typed-dict,typed-dict]" }))),
    'role': 'user'}]

Notice the ValidatorIterator stuff. It should not be there, is it?

I strongly believe that this is the culprit that causes logging with Logfire to not show Chat Completion section.

By Chat Completion section, I mean the pretty log like in the following image:

image

Debugging tips

Python, Mirascope & OS Versions, related packages (not required)

mirascope=0.17.0
pydantic==2.7.1
python=3.10.11
os=Mac
willbakst commented 3 months ago

I believe this is the same issue as https://github.com/pydantic/pydantic/issues/9467

For now the answer is to add SkipValidation to your history:

from pydantic import SkipValidation
...

class Librarian(OpenAICall):
    prompt_template = """
    SYSTEM: You are the world's greatest librarian.
    MESSAGES: {history}
    """

    history: SkipValidation[list[ChatCompletionMessageParam]] = []

...
off6atomic commented 3 months ago

Thanks @willbakst It solves the problem of ValidatorIterator. But it doesn't solve the problem of missing Chat Completion section though.

Here is the minimal code that shows missing Chat Completion section:

import pprint

import logfire
from dotenv import load_dotenv
from mirascope.logfire import with_logfire
from mirascope.openai import OpenAICall
from openai.types.chat import ChatCompletionMessageParam
from pydantic import SkipValidation

load_dotenv()
logfire.configure()

@with_logfire
class Librarian(OpenAICall):
    prompt_template = """
    SYSTEM: You are the world's greatest librarian. You answer very concisely.
    MESSAGES: {history}
    """

    history: SkipValidation[ChatCompletionMessageParam] = []

history = [
    {
        "role": "user",
        # uncomment below line to see LLM Chat Completions section
        # "content": "What fantasy book should I read?",
        "content": [{"type": "text", "text": "What fantasy book should I read?"}],
    },
]
librarian = Librarian(history=history)
pprint.pprint(librarian.messages(), indent=2)
print()

print(librarian.call().content)

You can try to uncomment the content field and comment the other one, and you will see the LLM Chat Completions section showing properly on Logfire.

What is the workaround to this bug?

willbakst commented 3 months ago

Oh, I believe this is an issue with logfire's integration with LLMs as it doesn't handle a content array (only a single string content).

I will take a deeper look and likely post a bug on their repo if I can confirm it's a bug without Mirascope :)

off6atomic commented 3 months ago

If they are buggy like that then maybe it's a not a good idea for me to use them. Because I'll also have to show text along with images too. Do you have alternatives which you like?

I used LangSmith with LangChain in the past and it was working fine but it's not allowing arbitrary log like Logfire.

Maybe LangFuse is a good alternative?

willbakst commented 3 months ago

I don’t believe LangFuse allows for arbitrary logging? I would have to check.

From my understanding if Logfire, the bug should be a simple UI fix to properly render the content. All of the content is still there it’s just not handled properly.

off6atomic commented 3 months ago

OK. I'll wait for them to fix the bug then. Thank you for helping me reporting bug on their repo. I appreciate it very much.

off6atomic commented 3 months ago

@willbakst Is this bug already reported on upstream repo? I just want to keep track of it.

willbakst commented 3 months ago

@off6atomic thank you for the reminder I totally blanked on posting this. It's posted now.

https://github.com/pydantic/logfire/issues/297

willbakst commented 1 month ago

I am closing this issue as it is entirely external.

For the Pydantic ValidationIterator issue, the stop-gap measure is to use SkipValidation, which prevents iterators from getting converted into ValidationIterator instances (since they won't be validated).

For the logfire UI issues, we can continue to track it in the issue I posted on their repo (see previous comment).