Open BatmanofZuhandArrgh opened 7 months ago
We used a HLSM's low-level controller as our low-level controller (per our paper). For 192 Alfworld games, we don't have a separate statistic for those. But they are a subset of ALFRED evaluation tasks so I assume the performance is going to be similar.
Hi,
How did you guys evaluate on Alfred? Skimming through it it seems that it requires some .pth deep learning model files. Did u use this codebase https://github.com/lbaa2022/LLMTaskPlanning
Also how did LLM-Planner do on the 192 AI2Thor games? I didn't find any info in your paper?
Thank u