Yifan-Song793 / ETO

Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)
https://arxiv.org/abs/2403.02502
88 stars 9 forks source link

How to Run the evaluation? #6

Closed George-Chia closed 3 months ago

George-Chia commented 3 months ago

Thanks for your great work!

I cannot evaluate the model following README. enumerate(all_tasks) can not execute normally.

Is the code of function completed? eval_agent/tasks/base.py @classmethod def load_tasks(cls, split: str, part_num: int, part_idx: int) -> Tuple[List["Task"], int]: pass

Yifan-Song793 commented 3 months ago

Thanks for your response~

The Task class in eval_agent/tasks/base.py is an abstract class. And the load_tasks method is an abstract method (I will add the decorator). The task specific loading method are implemented in eval_agent/tasks/webshop.py, eval_agent/tasks/sciworld.py and eval_agent/tasks/alfworld.py.

Could you kindly specify the environment in which you conducted the evaluation and provide more detailed error information during the evaluation?

George-Chia commented 3 months ago

Thanks!