ShishirPatil / gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)
https://gorilla.cs.berkeley.edu/
Apache License 2.0
11.28k stars 951 forks source link

Question about AST evaluation for Java and JavaScript #477

Closed lucenzhong closed 2 months ago

lucenzhong commented 3 months ago

https://github.com/ShishirPatil/gorilla/blob/2f39693df9868737e8074091d549b652d446d995/berkeley-function-call-leaderboard/model_handler/gpt_handler.py#L94C1-L109C30 Hello, May I ask why the gpt model with FC do not use tree-sitter to do the AST parsing for Java and JavaScript test categorie ?

HuanzhiMao commented 3 months ago

Hi @lucenzhong, GPT models in FC mode don't need to have their result parsed for Java and JavaScript categories, because the model output is in JSON format and all the parameters involved with the Java/JS test are of type string, which is JSON compatible. So here, json.load is enough to get the result ready to be fed into the evaluation pipeline. The actual parsing part (to turn parameter value from string into their 'real' type) for these parameters happens in the checker section(here), which involves calling the Java type converter and the JS type converter. These two converters both make use of the tree-sitter. Let me know if you have more questions!

ps, this is not related to your question, but these two lines in the code section you referenced are not necessary and should not exist (sorry I just noticed). They will introduce false positives to the result because they are double type-casting there (namely, they are giving the model a 'second chance' when their parameter type is wrong). This issue will affect quite a few models, not just the OpenAI family. We will roll out a fix for that soon.