Closed danieljannai21 closed 4 months ago
Hi @danieljannai21,
Thank you for your attention and thanks for flagging this! Sorry for the delayed reply; we were a bit busy recently :/
Best, BFCL Team
Hi @HuanzhiMao, Thank you for your answer!
Regarding my second point, I still don't understand - even is AST evaluation, why would we want to enforce typing even in the case of an integer that's represented as a float? Wouldn't it be 100% valid to use an int in that case, as the conversation can be done seamlessly without any information loss?
Also, not sure if it's something you addressed in your data fixes, but I noted that in the Java category (and possibly other ones as well, but I didn't read everything thoroughly), the questions are formatted as "How do I
Hi @danieljannai21 ,
Good question. Let me give you an example where type matters.
Let's say the function doc asks for an integer-type argument x
, and the source code of that function is defined as:
def func(x: int):
for i in range(1, 100 // x):
print(i)
Here, if you follow the function doc and do func(5)
, it would work fine. But if you give func(5.0)
, you will get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in func
TypeError: 'float' object cannot be interpreted as an integer
Since the model doesn't have access to the function source code, the model cannot safely assume that it's okay to mess up the type. Moreover, in programming languages like Java, the compiler would error if you provide a float value when the argument type is an int. It's simply invalid syntax.
Regarding your second concern: Thank you for pointing out how we format the prompt. We do observe questions starting with "How do I?" or similar in which it asks questions rather than provide tool use instruction. Here are 2 scenarios:
Should you decide to return the function call(s), Put it in the format of [func1(params_name=params_value, params_name2=params_value2...), func2(params)] \n NO other text MUST be included.
) such that no explanation should be provided if the model followed the instructions.With the above, if a model outputs a complete explanation, it is either not a user-friendly function-calling model or it is not following prompting instructions carefully. We would love to have questions rephrased more diversely in terms of tone to simulate real-world tool usage better, but that will possibly be in our future release.
Thanks, @HuanzhiMao!
I agree that the model should be able to output a function call in a parsable manner, but I don't think that's what your checking in the "How do I ..." questions. The phrasing of these questions doesn't suggest that the user wants the model to call a function, but asks it how to do something, in which case the model can tell the user to call the relevant function. In my opinion, phrasing the questions as "Please do ..." would yield much better results for most models, and would better evaluate their function calling capabilities. Also, the "NO other text MUST be included" is only added to the prompted models, and not the FC models.
Another thing I noticed is that there are cases where the gold answer contains a parameter that doesn't appear in the description of the function (for example, the price
parameter doesn't exist in the description of the book_room
function in question 46 of execution_multiple_function
category. I would suggest running an automatic validation that compares the function's documentation in the tools
section, the way it's invoked in the gold answer, and the actual implementation (if exists) and makes sure they're all consistent. Shouldn't be that hard to implement.
Thanks again for the wonderful work you guys are doing.
Hi,
First of all, I really appreciate your work!
I've been trying to run the leaderboard evaluation code, and noted several issues. I can probably fix some of them on my own in a PR, but wanted to hear your thought before that.
In this function, it isn't mentioned that
interest_rate
shouldn't be the number of percents (that is, 3.5 instead of 0.035 for example):In this example, it isn't mentioned that the
stock_name
should be a ticker symbol and not the company's name:Hard typing constraints. For example, one of the questions in
gorilla_openfunctions_v1_test_executable_multiple_function.json
isAs a data analyst, you are working on a project that requires you to organize a set of numerical data. Can you sort these numbers 34, 2, 56, 7, 9, 12 in descending order?
, where the description of the relavant function is:Since the given numbers are all integers, there shouldn't be a problem in sending them to the function as ints, rather than floats. Generally speaking, I think the typing constraints on floats should also be able to accept ints, as long as their values as correct.
Several incorrect "gold" answers. For example, in
gorilla_openfunctions_v1_test_executable_simple.json
, one of the questions isBook a king room of 10000 dollar from Dec.11,2023, to Aug.15,2024, with customer id 123.
, while the gold answer states that the value oftotal_price
should be 1000, rather than 10000.I also found a few cases where the list of values in "possible_answer" wasn't exhaustive enough, but I can't find any specific example right now.
Thanks in advance!