Given the current prevalence of Large Language Models (LLMs), are there any plans to include more LLM-based approaches in performance evaluations, especially focusing on zero-shot performance?
Here are a few relevant papers and approaches:
Hu, Yushi, et al. "In-context learning for few-shot dialogue state tracking." arXiv preprint arXiv:2203.08568 (2022).
Hudeček, Vojtěch, and Ondřej Dušek. "Are LLMs all you need for task-oriented dialogue?" arXiv preprint arXiv:2304.06556 (2023).
Heck, Michael, et al. "ChatGPT for zero-shot dialogue state tracking: A solution or an opportunity?" arXiv preprint arXiv:2306.01386 (2023).
Chung, Willy, et al. "Instructtods: Large language models for end-to-end task-oriented dialogue systems." arXiv preprint arXiv:2310.08885 (2023).
Li, Zekun, et al. "Large Language Models as Zero-shot Dialogue State Trackers through Function Calling." arXiv preprint arXiv:2402.10466 (2024).
Are there any plans to benchmark the performance of LLMs in zero-shot settings? I would be happy to assist with this if needed.
Hi @Leezekun - thanks for posting this. A simple answer is - absolutely! There are numbers of efforts to work in a zero-shot manner. If you are happy to update the benchmarks that would be very helpful!
Hi,
Thank you for the great work!
Given the current prevalence of Large Language Models (LLMs), are there any plans to include more LLM-based approaches in performance evaluations, especially focusing on zero-shot performance?
Here are a few relevant papers and approaches:
Are there any plans to benchmark the performance of LLMs in zero-shot settings? I would be happy to assist with this if needed.