DP3 baseline not generalizing.

YanjieZe / 3D-Diffusion-Policy

[RSS 2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations

https://3d-diffusion-policy.github.io

MIT License

508 stars 50 forks source link

DP3 baseline not generalizing. #49

Closed dennisushi closed 4 months ago

dennisushi commented 4 months ago

2024-07-04_14-34 Hi, I am training the model in the simplest possible real task: reaching an object. This is done with 7DoF Panda controlled via end effector pose (world-frame XYZ-RPY 6 actions), training SimpleDP3 on 40 demonstrations (with default 0.02 validation ratio as in your configs) until the test score converges near 0 (about 2k epochs).

As you can see in the plot, one of the trajectories (which I believe is assigned as a validation trajectory) is incorrect. My main suspicion is that my data is non-Markovian as I do it continuously with real-time teleoperation and that the model cannot handle that (e.g. short pauses in the movement). If my suspicion is correct, I suspect a larger time horizon may fix it partially.

Is the test score based on the validation trajectories as it doesn't match what I observe? Is this overfitting to demonstrations normal for the model?

YanjieZe commented 4 months ago

Hi, thank you for your interest!

Your experiments are actually interesting. As I observe in the figure, the line that you mention as not right, seems to be correct in its earlier part, but goes into messy in the end. Right? I am not pretty sure whether it is showing "not generalizing". Instead, I think it is more like that that trajectory is hard to predict (a sudden twist).

I think one property of diffusion polices (DP, DP3, and other variants) is to make the trajectory prediction smooth. The phenomenon you see might be caused from this property.

dennisushi commented 4 months ago

The bad predictions start much before the twist (see rightmost part of rightmost plot), and that twist isn't sudden in time, just in space. It seems to be more of a generalization issue, as I do not observe task-oriented behaviour when new data is presented - which means either the data is insufficient or in some incorrect format.

kingchou007 commented 4 months ago

Hi @dennisushi

Did you evaluate dp3 in real world? I've tried implementing dp3 in real-world task, but I'm struggling to achieve any meaningful results.

I don't know what am I missing or doing wrong that is leading to this extremely low success rate?

dennisushi commented 4 months ago

@kingchou007 Hi, I tried to evaluate in real world. The behaviour I am getting is consistently bad. If you want we can chat and figure out where we are failing together.

kingchou007 commented 4 months ago

@kingchou007 Hi, I tried to evaluate in real world. The behaviour I am getting is consistently bad. If you want we can chat and figure out where we are failing together.

Hi @dennisushi,

Sure, I've sent an e-mail to your Gmail.