THUNLP-MT / StableToolBench

A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.
https://zhichengg.github.io/stb.github.io/
Apache License 2.0
81 stars 11 forks source link

How is native LLM on this benchmark? #12

Open YenFuLin opened 1 month ago

YenFuLin commented 1 month ago

Hi, I'm wondering why this benchmark don't have native LLM's result(such as llama2, llama3). Do you plan to add these results on this work?

zhichengg commented 3 weeks ago

Hi, thank you for your question.

We have not tested these open-source models yet but it is on the roadmap.