Alfred Evaluation - Githubissues

OSU-NLP-Group / LLM-Planner

[ICCV'23] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

https://osu-nlp-group.github.io/LLM-Planner/

MIT License

150 stars 16 forks source link

Alfred Evaluation #16

Open BatmanofZuhandArrgh opened 7 months ago

BatmanofZuhandArrgh commented 7 months ago

Hi,

How did you guys evaluate on Alfred? Skimming through it it seems that it requires some .pth deep learning model files. Did u use this codebase https://github.com/lbaa2022/LLMTaskPlanning

Also how did LLM-Planner do on the 192 AI2Thor games? I didn't find any info in your paper?

Thank u

chanhee-luke commented 5 months ago

We used a HLSM's low-level controller as our low-level controller (per our paper). For 192 Alfworld games, we don't have a separate statistic for those. But they are a subset of ALFRED evaluation tasks so I assume the performance is going to be similar.