NVlabs / progprompt-vh

ProgPrompt for Virtualhome
Other
106 stars 12 forks source link

evaluation metric #2

Open serenawame opened 10 months ago

serenawame commented 10 months ago

In run_eval.py, results["overall"] = {'PSR': sum(sr)/len(sr), "SR": sr.count(1.0)/len(sr), "Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds), "Exec": sum(exec_per_task)/len(exec_per_task) } Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.

ishikasingh commented 10 months ago

GCR (goal condition recall) = PSR (partial success rate), we additionally have precision metric, which was mostly 100% for all agents (meaning all agents mostly do task-relevant actions only), so we didn't report it in the paper. Yes, it keeps track of unchanged states and only evaluates based on changes that happened in the final state over the execution.