Open serenawame opened 10 months ago
GCR (goal condition recall) = PSR (partial success rate), we additionally have precision metric, which was mostly 100% for all agents (meaning all agents mostly do task-relevant actions only), so we didn't report it in the paper. Yes, it keeps track of unchanged states and only evaluates based on changes that happened in the final state over the execution.
In run_eval.py, results["overall"] = {'PSR': sum(sr)/len(sr), "SR": sr.count(1.0)/len(sr), "Precision": 1-sum(unchanged_conds)/sum(total_unchanged_conds), "Exec": sum(exec_per_task)/len(exec_per_task) } Could you please explain which is "SR", "Exec", "GSR" in the paper? Based on my understanding, SR is calculated by "PSR" or "SR" , "Exec" is obtained by "Exec" in the code. But how to get "GCR"? Is that same as "Precision"? Checking if the executor keeps the states which should keep unchanged during the whole set of executions, unchanged, and translating it into the overlapping between the final achieved state g' and ground truth final state g.