Cranial-XIX / llm-pddl

355 stars 31 forks source link

Question about validation. #6

Closed suoych closed 11 months ago

suoych commented 1 year ago

Hi, thanks for sharing the great work. I am wondering how to calculate the success rate reported in the paper. Since I ran the validation program in the floortile domain and get the result of 6 plans valid in run1, 6 in run2, and 6 in run3. I am wondering how to get the 53.3% result reported in the paper. Moreover, may I ask how to evaluate the success rate of plans generated by other baseline methods? The validation file only contains evaluations in the LLM+P-ic method. Thanks for your patience and I am really looking forward to your reply.

haomengz commented 1 year ago

Similar problem here. How do you calculate the successful rate? In issue #4, the authors mentioned they were using VAL. Maybe try with that @suoych

YuqianJiang commented 11 months ago

Hi @suoych, thanks for point this out. The version on arXiv reported the success rates of a previous batch of experiments. We are in the process of updating them.

We manually evaluated the natural language plans generated by the baseline methods. To make it easier, we ran the optimal planner on ground-truth problem pddl files and compared the outputs. If the plan from LLM has fewer steps than the optimal plan (which happens a lot), we know the plan is incorrect.