Closed betterthanever2 closed 1 month ago
ALSO: It seems to me that in section Creating the first call, on the first line of the code block, it should be openai and not anthropic.
Good catch! I'm noting to fix this as part of the next docs update coming soon.
I'm pretty confused as to how the query is transferred from search function to the tool, since there is no mention of it whatsoever.
In my particular case, the search tool function takes more than 1 param. How do I feed those to the mechanism?
The input argument to the function (in this case query
) is included in the call as part of the tool schema. Simply adding new arguments to the function will provide those arguments as part of the tool schema to the call, so there's nothing else you should need to do here.
I'm realizing that the documentation is not particularly clear around what exactly is happening under the hood here. I've also noted this as part of our upcoming updates to ensure this is more clear.
Is there a convention or any nuances as to the naming of these tools?
There isn't any particular nuance to the naming, but there are two things you can do that we've found to help here:
I'm asking because right now I'm getting Pydantic error about a missing argument:
It seems that what's missing is the original tool_call
field of the tool. I'm not sure 100% what's going on here, it would be great if you could print the full original response (i.e. response.response
) and share that here so we can have a better sense of what's happening.
yesterday I saw myself face to face with this detail, being able to solve it in streaming mode (stream = True)
It seems that what's missing is the original
tool_call
field of the tool. I'm not sure 100% what's going on here, it would be great if you could print the full original response (i.e.response.response
) and share that here so we can have a better sense of what's happening.
what is tool_call
? The guide doesn't mention this param. I can see it in the API reference, but it's not clear how to use it.
@willbakst I tried adding tenacity as per your recommendation. Didn't seem to help
Previous Errors: [1 validation error for OpenAICallResponse
call_kwargs.tools.0.tool_call
Field required [type=missing, input_value={'function': {'name': 'se...'}}, 'type': 'function'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/missing]
Previous Errors: [1 validation error for OpenAICallResponse
call_kwargs.tools.0.tool_call
Field required [type=missing, input_value={'function': {'name': 'se...'}}, 'type': 'function'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/missing, 1 validation error for OpenAICallResponse
call_kwargs.tools.0.tool_call
Field required [type=missing, input_value={'function': {'name': 'se...'}}, 'type': 'function'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/missing]
RetryError[<Future at 0x170ed8dff50 state=finished raised ValidationError>]
I'm probably doing it wrong. This is my code:
@retry(stop=stop_after_attempt(3), after=collect_errors(ValidationError))
@openai.call(
model="gpt-4o-mini", ## TODO make sure this is the right model
tools=[search_the_web_with_serper],
call_params={"tool_choice": "required"}
)
@prompt_template(
"""
{previous_errors}
SYSTEM:
You are a an expert at finding information on the web.
Use the `search_the_web_with_serper` function to find information on the web.
Rewrite the search_query as needed to better find information on the web.
USER:
{search_query}
"""
)
def search(search_query: str, *, errors: list[ValidationError] | None = None) -> openai.OpenAIDynamicConfig:
previous_errors = None
if errors:
previous_errors = f"Previous Errors: {errors}"
print(previous_errors)
return {"computed_fields": {"previous_errors": previous_errors}}
@openai.call(
model="gpt-4o-mini",
response_model=list[SearchResponse]
)
@prompt_template(
"""
SYSTEM:
Extract the search_query, results, and sources to answer the question based on the results.
Results:
{results}
USER:
{search_query}
"""
)
def extract(search_query: str, results: str):
...
def run(search_query: str):
print(f"Running with search query: {search_query}")
response = search(search_query)
print(response.response)
if tool := response.tool:
output = tool.call()
result = extract(search_query, output)
return result
try:
res = run("In what year was the Alexandria library destroyed?")
print(res.response)
except Exception as e:
print(e)
search_the_web_with_seprer
function takes 5 arguments, I'm still not sure where they should be mentioned.
@betterthanever2 if you use it as a streaming it would not give you error
@openai.call(
model="gpt-4o-mini", ## TODO make sure this is the right model
stream=True, ##here
tools=[search_the_web_with_serper],
call_params={"tool_choice": "required"}
)
stream = func(...args)
for _, tool in stream:
if tool:
o = tool.call()
if you use it as a streaming it would not give you error
Well, I'm still getting an error, but a different one now:
'OpenAIStream' object has no attribute 'tool'
what is tool_call? The guide doesn't mention this param. I can see it in the API reference, but it's not clear how to use it.
This is the original tool call we use to construct the tool instance. It looks like the tool call itself is missing.
I tried adding tenacity as per your recommendation. Didn't seem to help
{previous_errors}
before the system message, which won't get parsed. You'll need to include it as part of the SYSTEM or USER message in the code you shared.
search_the_web_with_seprer
function takes 5 arguments, I'm still not sure where they should be mentioned.
Can you share the function? They should just be mentioned as the arguments.
'OpenAIStream' object has no attribute 'tool'
The OpenAIStream
object only returns tools as part of the stream, which you can access by iterating through the stream to get the (chunk, tool)
tuple. The stream object itself does not have a tool
attribute, so if you call stream.tool
that will fail.
Can you share the function? They should just be mentioned as the arguments.
Here it is:
def search_the_web_with_serper(search_query: str, serper_api_key: str, jina_api_key: str, avoid_sources: list[str], top_n: int = 10):
"""
User Serper to retrieve top_n results for a given search query.
1. Perform the search using Serper
2. Filter out the results that are in the avoid_sources list
3. For others, fetch content using Jina
"""
serper_url = "https://google.serper.dev/search"
payload = json.dumps({
"q": search_query,
"location": "United States",
"num": top_n
})
headers = {
'X-API-KEY': serper_api_key,
'Content-Type': 'application/json'}
response = requests.request("POST", serper_url, headers=headers, data=payload)
response_json = response.json()
result = []
for res in response_json.get("organic", []):
if (any(source.lower().replace(" ", "_") in res.get("link").lower().replace("-", "_") for source in avoid_sources)) or \
(any(source.lower() in res.get("title").lower() for source in avoid_sources)) or \
(any(source.lower() in res.get("snippet").lower() for source in avoid_sources)):
continue
else:
page_content = fetch_page_content_with_jina(res.get("link"), jina_api_key)
if page_content:
result.append({"url": res.get("link"), "content": page_content})
else:
print(f"Failed to fetch content for {res.get('link')}")
continue
return result
The
OpenAIStream
object only returns tools as part of the stream, which you can access by iterating through the stream to get the(chunk, tool)
tuple. The stream object itself does not have atool
attribute, so if you callstream.tool
that will fail.
Yeah, I eventually figured it out after @Elimeleth updated their recommendation. However, my internet connection is bad at the moment, so can't check, it keeps failing with connection error
.
Ok, first things first, I've identified a bug with calling tools and am working on a fix.
Separately, even after the fix is released, your search_the_web_with_search
function will need fixing.
The idea with tools is that you are giving the LLM a tool that it can request that you call. When making this request, the LLM will provide every argument needed to call the function.
In your case, you're asking the LLM to generate e.g. serper_api_key
, which is going to be garbage, so my guess is that there is some uncaught failure here causing the downstream issue you're facing.
For things like API keys, my recommendation would be to set them in your environment and load them with os
directly inside of the function rather than passing them in as arguments.
For other arguments that you may want to set more dynamically (including API keys and such if you desire), you can use BaseToolKit
and dynamic tools to access those arguments as state of the toolkit through self
. Here is an example:
from mirascope.core import BaseToolKit, openai, prompt_template, toolkit_tool
class BookTools(BaseToolKit):
reading_level: str
@toolkit_tool
def format_book(self, title: str, author: str) -> str:
"""Returns a book's title and author nicely formatted."""
print(self.reading_level)
return f"{title} by {author}"
@openai.call("gpt-4o-mini", call_params={"tool_choice": "required"})
@prompt_template("Recommend a {genre} book")
def recommend_book(genre: str, reading_level: str) -> openai.OpenAIDynamicConfig:
toolkit = BookTools(reading_level=reading_level)
return {"tools": toolkit.create_tools()}
response = recommend_book("fantasy", "beginner")
print(response.response)
assert (tool := response.tool) is not None
print(tool.call())
For things like API keys, my recommendation would be to set them in your environment and load them with
os
directly inside of the function rather than passing them in as arguments.
Oh, I've seen such statements in the docs. Now I see there is a reason why it's done like that.
The tool function in this example returns a string, and I've seen somewhere in the docs that tools must return strings... Would the following implementation work?
class SearchTools(BaseToolKit):
@toolkit_tool
def search_the_web_with_serper(search_query: str, top_n: int = 10):
"""
User Serper to retrieve top_n results for a given search query.
1. Perform the search using Serper
2. Filter out the results that are in the avoid_sources list
3. For others, fetch content using Jina ## NOTE as of moment writing this, Serper has recently launched method for fetching content of a page, but it is in beta, and doesn't fetch everything
"""
serper_api_key = os.getenv("SERPER_API_KEY")
jina_api_key = os.getenv("JINA_API_KEY")
avoid_sources = ["source.com", "anothersource.xyz"]
serper_url = "https://google.serper.dev/search"
payload = json.dumps({
"q": search_query,
"location": "United States",
"num": top_n
})
headers = {
'X-API-KEY': serper_api_key,
'Content-Type': 'application/json'}
response = requests.request("POST", serper_url, headers=headers, data=payload)
response_json = response.json()
result = []
for res in response_json.get("organic", []):
if (any(source.lower().replace(" ", "_") in res.get("link").lower().replace("-", "_") for source in avoid_sources)) or \
(any(source.lower() in res.get("title").lower() for source in avoid_sources)) or \
(any(source.lower() in res.get("snippet").lower() for source in avoid_sources)):
continue
else:
page_content = fetch_page_content_with_jina(res.get("link"), jina_api_key)
if page_content:
result.append({"url": res.get("link"), "content": page_content})
else:
print(f"Failed to fetch content for {res.get('link')}")
continue
return result
Ok the fix for this has been released in v1.1.1
I'll mark this issue as closed, but feel free to reopen it if you're still running into issues.
Sorry I didn't notice your response, so reopening and removing bug label.
Would the following implementation work?
For your implementation, you don't need to use BaseToolKit
since you aren't using self
anywhere. The point of BaseToolKit
is to give the toolkit tools access to the state of the toolkit.
The tool function in this example returns a string, and I've seen somewhere in the docs that tools must return strings...
Technically the tool does not have to return a string if you're just calling the tool. The reason we mention returning a string is for reinserting the tool call back into the history (where the message needs the tool output to be a string).
In your case you're just passing the results to another call where the input to your extraction call can be whatever you'd like (so long as the __str__
method works on it, as we'll call str(var)
when formatting the prompt template)
@willbakst
Ok, how about this:
class SearchTools(BaseToolKit):
search_query: str
top_n: int = 10
serper_api_key: str = os.getenv("SERPER_API_KEY")
jina_api_key: str = os.getenv("JINA_API_KEY")
@toolkit_tool
def search_the_web_with_serper(self):
"""
User Serper to retrieve top_n results for a given search query.
1. Perform the search using Serper
2. Filter out the results that are in the avoid_sources list
3. For others, fetch content using Jina ## NOTE as of moment writing this, Serper has recently launched method for fetching content of a page, but it is in beta, and doesn't fetch everything
"""
avoid_sources = ["wikipedia"]
serper_url = "https://google.serper.dev/search"
payload = json.dumps({
"q": self.search_query,
"location": "United States",
"num": self.top_n
})
headers = {
'X-API-KEY': self.serper_api_key,
'Content-Type': 'application/json'}
response = requests.request("POST", serper_url, headers=headers, data=payload)
response_json = response.json()
result = []
for res in response_json.get("organic", []):
if (any(source.lower().replace(" ", "_") in res.get("link").lower().replace("-", "_") for source in avoid_sources)) or \
(any(source.lower() in res.get("title").lower() for source in avoid_sources)) or \
(any(source.lower() in res.get("snippet").lower() for source in avoid_sources)):
continue
else:
page_content = fetch_page_content_with_jina(res.get("link"), self.jina_api_key)
if page_content:
result.append({"url": res.get("link"), "content": page_content})
else:
print(f"Failed to fetch content for {res.get('link')}")
continue
return result
And then
@retry(stop=stop_after_attempt(3), after=collect_errors(ValidationError))
@openai.call(
model="gpt-4o-mini", ## TODO make sure this is the right model
stream=True,
call_params={"tool_choice": "required"})
@prompt_template(
"""
SYSTEM:
You are a an expert at finding information on the web.
Use the `search_the_web_with_serper` function to find information on the web.
Rewrite the search_query as needed to better find information on the web.
Take into account previous errors, if any: {previous_errors}
USER:
{search_query}
"""
)
def search(search_query: str, *, errors: list[ValidationError] | None = None) -> openai.OpenAIDynamicConfig:
toolkit = SearchTools(search_query=search_query, top_n=15)
previous_errors = None
if errors:
previous_errors = f"Previous Errors: {errors}"
print(previous_errors)
return {"computed_fields": {"previous_errors": previous_errors}, "tools": toolkit.create_tools()}
@betterthanever2 this is closer but may not actually be what you want.
By putting search_query
as state of the toolkit and setting it based on the user's actual query, you're removing the LLM's ability to potentially update the search query to something better. If you include search_query
as an argument of the tool then the LLM can provide it, and tool.call()
will use the search_query
provided by the LLM. This may be 1:1 with the user's query, but it may also be updated.
An explicit example would be if you include the current date and time in the system prompt and then the user asks "recent LLM-related news" where the LLM may decide to update the search query to "news about large language models august 29th"
If you always want a 1:1 mapping, then the way you have it is totally fine.
@betterthanever2 this is closer but may not actually be what you want.
By putting
search_query
as state of the toolkit and setting it based on the user's actual query, you're removing the LLM's ability to potentially update the search query to something better. If you includesearch_query
as an argument of the tool then the LLM can provide it, andtool.call()
will use thesearch_query
provided by the LLM. This may be 1:1 with the user's query, but it may also be updated.An explicit example would be if you include the current date and time in the system prompt and then the user asks "recent LLM-related news" where the LLM may decide to update the search query to "news about large language models august 29th"
If you always want a 1:1 mapping, then the way you have it is totally fine.
Thank you. This makes sense.
@willbakst Here's a question: suppose, I want to instruct the agent to extract pieces of data, and return it in structured format - what do I need to change in the code above? I did the following:
Field
named details
to SearchResponse
class with type dict
. Put all the details in a dictionary named
detailswith the key as the detail and the value as the value of the detail.
This doesn't seem to be enough, because I'm getting Pydantic's validation error about missing details
field.
@betterthanever2 unfortunately OpenAI can't handle dict
types when using tools, so you'll need to set json_mode=True
.
You can find the full list of identified support for tools and JSON mode across all providers here
unfortunately...
Hmm, indicating json_mode=True
gave me pretty much what I wanted. Nothing unfortunate about that :) Is the difference that for other models you don't need to set the json mode?
Anyway, thank you so much for your kind explanations! I really like the library so far.
Is the difference that for other models you don't need to set the json mode?
yeah, some models support dict
type arguments when using tools. For response_model
this isn't too big of a deal since you can set json_mode=True
, but when you want to use tools for tools it matters
Anyway, thank you so much for your kind explanations! I really like the library so far.
Always happy to help! So glad you're liking the library so far :)
Question
Hi. My first day working with Mirascope. I'm trying to adapt this guide: https://mirascope.io/docs/latest/cookbook/search_with_sources/
In this guide, function
nimble_google_search
has 1 param namedquery
. When defining thecall
, it is listed undertools
without indication of any params it takes. I'm pretty confused as to how the query is transferred fromsearch
function to the tool, since there is no mention of it whatsoever.In my particular case, the search tool function takes more than 1 param. How do I feed those to the mechanism?
Is there a convention or any nuances as to the naming of these tools? I'm asking because right now I'm getting Pydantic error about a missing argument:
Although, the error response is not particularly clear due to cutting the name of the missing param, I did figure out that it's the name of the search function I'm using,
search_the_web
.ALSO: It seems to me that in section
Creating the first call
, on the first line of the code block, it should beopenai
and notanthropic
.