buoyancy99 / diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"
Other
552 stars 24 forks source link

How do frame_stack, action_stack, and n_frames settings affect the performance? #22

Closed le-wei closed 1 week ago

le-wei commented 3 weeks ago

Hello, sorry to bother you again. Recently, I have been studying your paper, and I have a question regarding its application in the field of imitation learning. How do frame_stack, action_stack, and n_frames settings affect the performance? If the data is collected at a frequency of 30 Hz, what would be the optimal settings for frame_stack, action_stack, and n_frames? Additionally, I would like to know if it is necessary to add a separate loss function to constrain the generated actions. I greatly appreciate your response.

buoyancy99 commented 1 week ago

I believe that the action chunk size from ACT or diffusion policy shall serve as a good action_stack size to begin with, and you can use something smaller or bigger from that value. The action stack basically says that if your action is high correlated locally, we don't need to waste too many computes by creating many tokens for them.

n_frames is largely dependent on task horizon. However, to achieve compositionally instead of overfitting long horizon, there some special techniques uniquely enabled by DF. This will be the main research question for DF2.

Add a separate loss function to constrain the actions is definitely doable

le-wei commented 1 week ago

"Thank you very much for your explanation. I will give it a try. Thanks."