alexa / teach

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.
132 stars 26 forks source link

Trajectories that raise an error are ignored #13

Open aleSuglia opened 2 years ago

aleSuglia commented 2 years ago

Hi @aishwaryap,

I was reading about the recent changes to the code reported in #10 and we unfortunately get results that differ substantially from yours. I started dissecting the code to understand what's the reason for such discrepancies in the results. From my understanding of the inference_runner.py script, you spawn several processes, each with a given portion of the tasks. However, I can see that the exception handling logic simply ignores an instance that raises an error: https://github.com/emma-simbot/teach/blob/speaker_tokens/src/teach/inference/inference_runner.py#L130

This is detrimental because if a dataset instance errors for whatever reason, its contribution to the overall metrics is ignored. Instead, the proper way of dealing with this should be to ignore that trajectory and still add to the metrics that you were not successful. Potentially, such faulty trajectories should be reported in the metrics file for future debugging.

Am I missing something?

aishwaryap commented 2 years ago

Hi @aleSuglia

There is a separation here to differentiate between possible errors from the model and the rest of the inference code. Most of the error handling is inside InferenceRunner._run_edh_instance (here) where exceptions from invocations of the model should get logged in metrics. Are you having specific errors at other places that are not getting caught?

That said, we probably should be returning the list of failed files in InferenceRunner._run() for debugging. I will work on this on Monday.

For leaderboard evaluations, we are separately confirming that all inference files did get saved.