I think this approach should be fairly flexible and minimizes the number of LLM calls to:
Initial assistant pass for tool use.
Vision tool: Always answers users query directly and then adds a web search query if it thinks a web search is necessary. Can set a reverse_image_search flag, which currently is rarely used.
We can disable reverrse_image_search altogether if need-be and then investigate bringing it back by identifying specific examples where we want it used.
whenever it starts struggling to give answer it fails to return the response in json
eg. "I apolgize .."
till now i have seen only 2 times search invoked
most of the time remains blank and or reverse_image_search is false
so with photo no internet search be it description or reverse image
I think this approach should be fairly flexible and minimizes the number of LLM calls to:
We can disable reverrse_image_search altogether if need-be and then investigate bringing it back by identifying specific examples where we want it used.