ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.22k stars 936 forks source link

[BFCL] Sanity check should be optional and by default off #486

Closed ShishirPatil closed 2 months ago

ShishirPatil commented 2 months ago

For BFCL eval, the sanity check of weather all RESTful APIs are active should be optional flag that is off by default. It is currently on by default.

HuanzhiMao commented 2 months ago

I think it should be on by default. We want to warn the users as soon as possible if the API is down, because that will make the result inaccurate. The user can then choose to ignore the warning and proceed with the evaluation if they want.

ShishirPatil commented 2 months ago

So, if a single RESTful API is down, rite now we just exit the evaluation - even offline AST based evals are not run although AST based evals are independent and not-affected by the online RESTful APIs. What we should do is that if a set of online RESTful APIs are down (e.g. Yahoo Finance) we should flag it to the user so they know exactly what the error bounds are, and then we should continue execution.