Dealing with new model behaviors

MrCsabaToth commented 1 month ago

This is partially about the firebase-ertexai upgrade (#53), and also the new release of the -002 stable models. Seems like function calling behavior and other things changed.

MrCsabaToth commented 1 month ago

New behavior: function calling changes

I switched over to the -002 models explicitly, and the function calling behavior changed. In the past (and on the submission demo) I can simply ask "What will be the weather tomorrow" or "What will be the weather next week". The model assumed (correctly) that I implicitly meant the weather at my current location, and compared to the current date/time.
The new model is very specific and picky, it doesn't think anything implied. It asks if I'll stay at my current location tomorrow (or next week) to answer the question, and it also doesn't seem to be aware of the current date / time, so that needs to be stuffed into the prompt.
The new models also cannot comprehend how to query the weather for "next week". The schema of the weather tool allows for start and end date parameters. The old model could easily deduct and substitute these two variables to obtain the next week's weather. The new model is lame and gives up, it somehow doesn't able to comprehend how to achieve this. It states simply that it can obtain a weather for a specific day but not for a week. Which is false.

MrCsabaToth commented 1 month ago

Another breaking change: the function calling stopped working all together because Please ensure that function call turn comes immediately after a user turn or after a function response turn. Others are dealing with this too (GD Community Gemini API thread): https://discord.com/channels/1009525727504384150/1289794849003802735

MrCsabaToth commented 4 weeks ago

So function calling is kinda solved. Even though it is really nondeterministic: for example for the "What's the weather tomorrow" question about half of the time the model states it doesn't have means to obtain it, like it wouldn't see the weather tool at all. But half of the time just works.

Then latest is that the model flat out states it cannot process images. The Pro seems to stick to this always, whereas the Flash can process the image (for example it correctly OCRs reading on the food packaging), but then it continues by saying it cannot handle images. Say what?

CsabaConsulting / InspectorGadgetApp

Dealing with new model behaviors #56

New behavior: function calling changes