Closed Wuziyi616 closed 2 years ago
I went through some tasks in the within
protocol, fold 0
test set data. I found some task-action pairs which are labeled as SOLVED
very weird, it seems that the simulated videos end before the objects enter stable states, leading to wrong labels.
Examples:
My conjecture is that, the simulator sees the cyan and the blue object in contact lasting for 3s, so it assumes the simulation should end and assign a SOLVED label? The good news is, it seems that most of such errors (if they indeed are) only happen in this one task family.
Another possibility is that I misunderstood the goal condition. Should the goal state be a stable state, or if the two target objects just touch each other once and then separate is still considered a success?
Hi there! The condition for task being marked as solved is whether two target objects are in touching relation for 3 seconds. It does allow for some weird solutions where the objects are not stable but still are in contact for 3 seconds. That is pretty rare though
I agree, it's indeed very rare. I train a model only looking at the last frame to determine whether the task is solved, and can achieve 99% accuracy. Thanks for the answer!
Hi, thanks for open-sourcing this great work. I have a question regarding the simulated video length. I went through several videos generated by
The length of the generated videos varies a lot (seems to be from 6 to 18 or so). So I wonder how is the length determined? I look at the video, it seems that the video ends 3 seconds after the object-of-interest (green and blue/purple ones) come to stable states (e.g. stop moving and in contact)? Is this observation correct? Thanks!