Open RiHY99 opened 2 months ago
Hi @RiHY99,
You are correct. We measure the zero-shot benchmark with k=20 settings, as ChatGPT produces different outputs across multiple runs. For algorithmic models, we assume they generate the same outputs in multiple runs and use a single output for evaluation. I hope this helps!
Thanks for your work. I want to know if the zero-shot results on ETH/UCY are all calculated by multimodal predictions? If predictions are all multimodal, is the K equal to 20? And how to produce multimodal predictions for these methods: Linear, Kalman Filter, AutoTrajectory, and LMTraj-ZERO-GPT3.5/GPT4?