Plan for including LLMs' Zero-shot performance?

Hi,

Thank you for the great work!

Given the current prevalence of Large Language Models (LLMs), are there any plans to include more LLM-based approaches in performance evaluations, especially focusing on zero-shot performance?

Here are a few relevant papers and approaches:

Hu, Yushi, et al. "In-context learning for few-shot dialogue state tracking." arXiv preprint arXiv:2203.08568 (2022).
Hudeček, Vojtěch, and Ondřej Dušek. "Are LLMs all you need for task-oriented dialogue?" arXiv preprint arXiv:2304.06556 (2023).
Heck, Michael, et al. "ChatGPT for zero-shot dialogue state tracking: A solution or an opportunity?" arXiv preprint arXiv:2306.01386 (2023).
Chung, Willy, et al. "Instructtods: Large language models for end-to-end task-oriented dialogue systems." arXiv preprint arXiv:2310.08885 (2023).
Li, Zekun, et al. "Large Language Models as Zero-shot Dialogue State Trackers through Function Calling." arXiv preprint arXiv:2402.10466 (2024).

Are there any plans to benchmark the performance of LLMs in zero-shot settings? I would be happy to assist with this if needed.

budzianowski / multiwoz

Plan for including LLMs' Zero-shot performance? #132