evaluation metric - Githubissues

NVlabs / progprompt-vh

ProgPrompt for Virtualhome

Other

106 stars 12 forks source link

In run_eval.py, results["overall"] = {'PSR': sum(sr)/len(sr), "SR": sr.count(1.0)/len(sr), "Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds), "Exec": sum(exec_per_task)/len(exec_per_task) } Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.

NVlabs / progprompt-vh

evaluation metric #2