How is native LLM on this benchmark?

THUNLP-MT / StableToolBench

A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.

https://zhichengg.github.io/stb.github.io/

Apache License 2.0

81 stars 11 forks source link

Open YenFuLin opened 1 month ago

YenFuLin commented 1 month ago

Hi, I'm wondering why this benchmark don't have native LLM's result(such as llama2, llama3). Do you plan to add these results on this work?

zhichengg commented 3 weeks ago

Hi, thank you for your question.

We have not tested these open-source models yet but it is on the roadmap.