Mirascope / mirascope

LLM abstractions that aren't obstructions
https://mirascope.com/docs
MIT License
704 stars 43 forks source link

[BUG] - Extraction error when using GPT-4 #226

Closed ramon-prieto closed 5 months ago

ramon-prieto commented 5 months ago

To Reproduce Steps to reproduce the behavior:

  1. Create a pydantic model and it's extractor class
    
    from pydantic import BaseModel
    from mirascope.openai import OpenAIExtractor
    from typing import Literal, Type

class TaskDetails(BaseModel): due_date: str description: str priority: Literal["low", "medium", "high"]

class TaskExtractor(OpenAIExtractor[TaskDetails]): extract_schema: Type[TaskDetails] = TaskDetails prompt_template = """ Extract the task details from the following task: {task} """ task: str

2. Run extraction task with model set to `gpt-4-turbo` or `gpt-4`. Failure happens if passing through kwargs or call_params
```py
task = "Submit quarterly report. Task is high priority."
task_details = TaskExtractor(task=task).extract(**{"model": "gpt-4"})

Expected behavior The extractor should return the TaskDetails model

Stacktrace

{
    "name": "RetryError",
    "message": "RetryError[<Future at 0x10b88eae0 state=finished raised AttributeError>]",
    "stack": "---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/mirascope/base/extractors.py:156, in BaseExtractor._extract(self, call_type, tool_type, retries, **kwargs)
    155 try:
--> 156     extraction = _extract_attempt(
    157         call_type, tool_type, error_messages, **kwargs
    158     )
    159 except (AttributeError, ValueError, ValidationError) as e:

File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/mirascope/base/extractors.py:144, in BaseExtractor._extract.<locals>._extract_attempt(call_type, tool_type, error_messages, **kwargs)
    143 if extracted_schema is None:
--> 144     raise AttributeError(\"No tool found in the completion.\")
    145 return extracted_schema

AttributeError: No tool found in the completion.

The above exception was the direct cause of the following exception:

RetryError                                Traceback (most recent call last)
Cell In[80], line 1
----> 1 task_details = TaskExtractor(task=task).extract(**{\"model\": \"gpt-4\"})

File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/mirascope/openai/extractors.py:84, in OpenAIExtractor.extract(self, retries, **kwargs)
     61 def extract(self, retries: Union[int, Retrying] = 0, **kwargs: Any) -> T:
     62     \"\"\"Extracts `extract_schema` from the OpenAI call response.
     63 
     64     The `extract_schema` is converted into an `OpenAITool`, complete with a
   (...)
     82             https://platform.openai.com/docs/guides/error-codes/api-errors
     83     \"\"\"
---> 84     return self._extract(OpenAICall, OpenAITool, retries, **kwargs)

File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/mirascope/base/extractors.py:167, in BaseExtractor._extract(self, call_type, tool_type, retries, **kwargs)
    165                 raise
    166 except RetryError as e:
--> 167     raise e
    168 return extraction

File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/mirascope/base/extractors.py:153, in BaseExtractor._extract(self, call_type, tool_type, retries, **kwargs)
    151 try:
    152     error_messages: dict[str, Any] = {}
--> 153     for attempt in retries:
    154         with attempt:
    155             try:

File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/tenacity/__init__.py:347, in BaseRetrying.__iter__(self)
    345 retry_state = RetryCallState(self, fn=None, args=(), kwargs={})
    346 while True:
--> 347     do = self.iter(retry_state=retry_state)
    348     if isinstance(do, DoAttempt):
    349         yield AttemptManager(retry_state=retry_state)

File ~/src/AIBoostrapper/trially/llm-nlp-pipelines/.venv/lib/python3.12/site-packages/tenacity/__init__.py:326, in BaseRetrying.iter(self, retry_state)
    324     if self.reraise:
    325         raise retry_exc.reraise()
--> 326     raise retry_exc from fut.exception()
    328 if self.wait:
    329     sleep = self.wait(retry_state)

RetryError: RetryError[<Future at 0x10b88eae0 state=finished raised AttributeError>]"
}

Setup mirascope version: 0.12.1 python: 3.12.3

willbakst commented 5 months ago

Taking a look, it looks like some interesting/weird stuff is happening.

  1. For some reason, the model is returning an assistant message instead of a tool call, which is not currently handled. This is something that we should handle, and I will be working on a bug fix for this because users shouldn't have to worry about forcing the model to use a tool to get things to work.
  2. If I update the prompt to force tool usage (i.e. "Using your tools, extract...") the model responds with "The task details fail to provide essential information such as the due date for task completion. Please provide the missing information." This is sad :(
  3. If I update the task to have a due date, this works, but only sometimes as it still sometimes chooses to respond with an assistant message instead of a tool call. Another reason to implement a fix for this.
  4. If I use gpt-4-turbo-2024-04-09 it works, but it provides a hallucinated due date. This can be resolved by giving due_date: str = "" the default empty string.
  5. We are throwing a retry error with the new retry update even if retries aren't set. This should not be the case, and we should instead be throwing the base error. We will also work on a fix for this.

Thank you for bringing this to our attention! I will hopefully get fixes for these out soon :)

willbakst commented 5 months ago

@ramon-prieto the fix is now released in v0.12.2 :)

ramon-prieto commented 5 months ago

Thanks for the quick fix! I'm loving working with mirascope, it's so intuitive and bloat-free compared to other frameworks

willbakst commented 5 months ago

I'm glad you're liking the library! Thank you for giving us a try :)

If you ever find any other bugs or have any feature requests, post an issue and we'll get on it asap!

dearkafka commented 4 months ago

so what's the best practice for prompt to avoid this issue? it's still happening (I guess with longer prompts)

willbakst commented 4 months ago

@dearkafka is there a minimal example you could share that's still throwing this error?

My recommendation for "best practice" would be to set retries >= 2. Not ideal in that this will likely result in two calls instead of one, but it will feed the error back into the follow-up call, and so far I've seen it return the proper response on the second call every time.

Sometimes it's also the prompt itself. There's some prompt engineering techniques you can utilize like I mentioned in my previous comment to try and force tool use, but it doesn't always work.

I believe there's a new call param for OpenAI that you can use for forcing tool use, so that might also be worth a try. I think it is setting tool_choice: "required" in the call params.

dearkafka commented 4 months ago

somehow even this is not working for me. yeah, I think prompt engineering should force it. thanks

import os
from typing import Type

from fastapi import FastAPI
from mirascope.openai import OpenAIExtractor
from pydantic import BaseModel

app = FastAPI()

class Book(BaseModel):
    title: str
    author: str

class BookRecommender(OpenAIExtractor[Book]):
    extract_schema: Type[Book] = Book
    prompt_template = "Please recommend a {genre} book."

    genre: str

@app.post("/")
def root(book_recommender: BookRecommender) -> Book:
    """Generates a book based on provided `genre`."""
    return book_recommender.extract(model="gpt-4-turbo")
willbakst commented 4 months ago

Looks like this is just gpt being terrible right now for some reason. I just checked the output for No tool using logfire and it was this:

""" Here's a fantasy book recommendation for you:

Title: "The Name of the Wind" Author: Patrick Rothfuss

This book is the first in the "Kingkiller Chronicle" series. It follows the story of Kvothe, a gifted young man who grows up to be the most notorious magician his world has ever known. From his childhood in a traveling troupe of musicians to his years spent as a near-feral orphan in a crime-ridden city, to his daringly brazen yet successful bid to enter a legendary school of magic, Kvothe's life is anything but ordinary. With beautiful prose and rich storytelling, Rothfuss unfolds a compelling narrative that dives deep into the themes of identity, legend, and the nature of truth. """

My suggestion would be to use Anthropic or Groq, they do a much more consistent job of this.

I also tried using tool_choice="required" except for some reason the stop_reason is stop and not tool_calls :(

I will look into fixing a separate bug issue for this.

dearkafka commented 4 months ago

yes, tool_choice also fails for me. yep, something happened to gpt-4.

willbakst commented 4 months ago

I believe this is due to the addition of tool_choice="required" and internally enabling more freedom in tool choice if not required (indicating that the prompt needs to force the tool if not through the option, but even that may be iffy).

Working on a fix for tool_choice in #256 so let's move any further discussion on this there (or a new issue should that fix not resolve something else that comes up).