facebookresearch / phyre

PHYRE is a benchmark for physical reasoning.
https://phyre.ai
Apache License 2.0
430 stars 61 forks source link

How is the simulated video length related to the object states? #78

Closed Wuziyi616 closed 2 years ago

Wuziyi616 commented 2 years ago

Hi, thanks for open-sourcing this great work. I have a question regarding the simulated video length. I went through several videos generated by

sim = simulator.simulate_action(task_id, act, stride=60, ...)
video = sim.images

The length of the generated videos varies a lot (seems to be from 6 to 18 or so). So I wonder how is the length determined? I look at the video, it seems that the video ends 3 seconds after the object-of-interest (green and blue/purple ones) come to stable states (e.g. stop moving and in contact)? Is this observation correct? Thanks!

Wuziyi616 commented 2 years ago

I went through some tasks in the within protocol, fold 0 test set data. I found some task-action pairs which are labeled as SOLVED very weird, it seems that the simulated videos end before the objects enter stable states, leading to wrong labels.

Examples:

My conjecture is that, the simulator sees the cyan and the blue object in contact lasting for 3s, so it assumes the simulation should end and assign a SOLVED label? The good news is, it seems that most of such errors (if they indeed are) only happen in this one task family.

Another possibility is that I misunderstood the goal condition. Should the goal state be a stable state, or if the two target objects just touch each other once and then separate is still considered a success?

akhti commented 2 years ago

Hi there! The condition for task being marked as solved is whether two target objects are in touching relation for 3 seconds. It does allow for some weird solutions where the objects are not stable but still are in contact for 3 seconds. That is pretty rare though

Wuziyi616 commented 2 years ago

I agree, it's indeed very rare. I train a model only looking at the last frame to determine whether the task is solved, and can achieve 99% accuracy. Thanks for the answer!