Liang-ZX / AdaptDiffuser

[ICML'2023] "AdaptDiffuser: Diffusion Models as Adaptive Self-evolving Planners"
https://arxiv.org/abs/2302.01877
MIT License
45 stars 0 forks source link

question abount reward guidance and dynamics discriminator #4

Closed return-sleep closed 5 months ago

return-sleep commented 9 months ago

Thanks for your excellent work. After reading the paper, I have some confusing points including treward guidance and dynamics discriminator on locomotion tasks.

  1. dynamics discriminator How should this dyanmics transition T be obtained, thus allowing to get the next state based on the current state and action s_{t+1}=T(s_t,a_t). Have you performed this action and interacted with the environment online? 20df9589c0a277ea5f84b7ecbe484d8
  2. reward function When the reward of a goal-conditioned task can be obtained by calculating the distance to the goal state. How do you get the reward function for locomotion tasks and other unseen tasks? Does this require more information than diffuser? a536a4cd82632c565efb554eff37ceb 3 From the experimental part, is the generalization ability of this work only for tasks where the reward function can be known in advance (goal-conditioned task)? In summary, I'm very curious if the generalization and adaptability of adaptdiffuser relies on extra knowledge of the task, such as transition model T and predefined reward functions, which are not available for other methods.

Looking forward to your reply!

Looomo commented 9 months ago

Same question, have you found the codes for dynamics discriminator?

Liang-ZX commented 9 months ago

Thanks for your excellent work. After reading the paper, I have some confusing points including treward guidance and dynamics discriminator on locomotion tasks.

  1. dynamics discriminator How should this dyanmics transition T be obtained, thus allowing to get the next state based on the current state and action s_{t+1}=T(s_t,a_t). Have you performed this action and interacted with the environment online? 20df9589c0a277ea5f84b7ecbe484d8
  2. reward function When the reward of a goal-conditioned task can be obtained by calculating the distance to the goal state. How do you get the reward function for locomotion tasks and other unseen tasks? Does this require more information than diffuser? a536a4cd82632c565efb554eff37ceb 3 From the experimental part, is the generalization ability of this work only for tasks where the reward function can be known in advance (goal-conditioned task)? In summary, I'm very curious if the generalization and adaptability of adaptdiffuser relies on extra knowledge of the task, such as transition model T and predefined reward functions, which are not available for other methods.

Looking forward to your reply!

  1. In different settings, we deploy different transition models. For the kuka, because the environment wrapper splits the environment interaction from the position transition of robot-arm. In kuka, we use the env.step (the function provided in our code) to get next state (There are no environment interaction here).

  2. For locomotion, another reward model is also required that learns from the existing offline data. We follow the similar settings in https://github.com/jannerm/diffuser/blob/main/diffuser/models/diffusion.py#L235

  3. From the replies above, it's feasible for unknown reward functions and transition model. But another networks are required to capture the patterns.

Thank you!

Looomo commented 9 months ago

Thanks for the reply! One more question, is the transition model trained sperately with the trajectory generation model? Besides, when would the codes for locomotion tasks available?

Liang-ZX commented 5 months ago

Thanks for the reply! One more question, is the transition model trained sperately with the trajectory generation model? Besides, when would the codes for locomotion tasks available?

For a simplified dynamics model, you can refer to https://github.com/anagabandi/nn_dynamics/blob/master/dynamics_model.py

And the code will be released after an updated journal version of this paper. Thanks a lot.