expert trajectories是如何采集的？

Yifan-Song793 / ETO

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)

https://arxiv.org/abs/2403.02502

88 stars 9 forks source link

Closed Fu-Dayuan closed 5 months ago

Fu-Dayuan commented 6 months ago

如题， expert trajectories是通过ChatGPT（or GPT4）采样得到的，还是llama-chat呢？我观察到即使是SFT的版本也比llama-chat版本高很多

Yifan-Song793 commented 6 months ago

您好，感谢对我们工作的关注！

对于 WebShop，expert trajectory 一部分来自 WebShop 作者提供的 human demonstration，另一部分我们使用 GPT-4 进行探索并按照 final reward >= 0.7 过滤得到；
ScienceWorld 环境提供了 golden trajectory 的自动生成算法，我们对其进行预处理并使用 GPT-4 标注 CoT；
对于 ALFWorld，我们对原始数据中的 human demonstration 进行预处理得到 expert trajectory，并使用 GPT-4 标注 CoT

yananchen1989 commented 5 months ago

你好，请问可否提供这个项目里训练使用的expert trajectory 吗？

谢谢。

Yifan-Song793 commented 5 months ago

您好，在setup.sh中会自动下载 expert trajectory，包括 WebShop, ScienceWorld, ALFWorld 三个环境的 expert trajectory，也可以在这里进行下载：https://drive.google.com/file/d/1YbhbL8RhQGDWFv5y6k1qgwRqSyFFsao8/view?usp=drive_link