Closed freiberg-roman closed 2 years ago
Hi @freiberg-roman, this is awesome! Thanks a lot for your interest and contribution. I'll take a closer look soon. As a high level question, have you tested the PDDM implementation in some environments already? Do you have any plots you can share?
Also, from a very quick glance it looks like formatting can be improved in some places. Please install pre-commit to run some hooks automatically for you before a commit (sorry, I should have added instructions for this in the README).
Hi @freiberg-roman, this is awesome! Thanks a lot for your interest and contribution. I'll take a closer look soon. As a high level question, have you tested the PDDM implementation in some environments already? Do you have any plots you can share?
Also, from a very quick glance it looks like formatting can be improved in some places. Please install pre-commit to run some hooks automatically for you before a commit (sorry, I should have added instructions for this in the README).
cleaned up a bit. I'm currently benchmarking on common MuJoCo environments and will probably give an update next week. On trivial environments like CartPole it seems to run well.
Hi @freiberg-roman. I took a look at the sequential SequenceTransitionIterator
, which is actually something I was about to start writing for a PlaNet implementation in #86. I like most of the logic of what you did, but have some comments about how to structure it differently. We could have some discussion about this here, but I'm also thinking it would be easier if I just take your code as a starting point and make the changes. I can do a PR to your fork (just for the replay buffer changes) so you can give some input, as it will probably mean you'll have to change some of your PDDM code; once it's ready we can merge the new replay buffer into master from your fork. How does that sound?
Hi @freiberg-roman. I took a look at the sequential
SequenceTransitionIterator
, which is actually something I was about to start writing for a PlaNet implementation in #86. I like most of the logic of what you did, but have some comments about how to structure it differently. We could have some discussion about this here, but I'm also thinking it would be easier if I just take your code as a starting point and make the changes. I can do a PR to your fork (just for the replay buffer changes) so you can give some input, as it will probably mean you'll have to change some of your PDDM code; once it's ready we can merge the new replay buffer into master from your fork. How does that sound?
Sure, I'm planning a proper rework anyway, so it would be nice if we could find a common API for the replay buffer now. This version is more or less a hack to get it running.
@freiberg-roman I made a new pull request for the sequential iterator (#91), would you mind taking a look? I ended up only using a small portion of your code, since I did some refactoring of the base classes to make it cleaner.
Note that this iterator returns an extra dimension for the sequence, so it's a bit different than what you currently have in this PR. You will need to change a few things in your PDDM code, but hopefully just some reshapes in the right places.
@luisenp hello, sorry for not giving any update had a lot going on in the last few weeks. Unfortunately there were some bugs and I have to restart my experiments. I already had good results with ground truth dynamics (also added a small notebook to play around with) and am in the process of investigating most components separately to get the reported performance out of PDDM. This could probably take a month. I think the code is mostly done in terms of structure. If you would like, we could work on finalizing this PR and I could start a new one just for tuning. What do you think about this?
Hi @freiberg-roman, thanks for the update! I've also been busy with other things which is why I haven't done a great job at following up on PRs lately, but things should clear up after this week. Let me start by playing around a bit with your code first and run the examples you currently have, and we can go from there. One possibility would be to merge the trajectory optimizer separately if it's working as expected with ground truth dynamics, and defer integrating PDDM until we have a bit more understanding of the issues.
Hi @freiberg-roman. Sorry for the delay, I just noticed that the last thing I said was that I would take a look at your examples, and somehow I thought I had asked you a question and was waiting for answer. In any case, is this the notebook to test on ground truth dynamics?
Hi @luisenp. Sorry, I'm short on time at the moment, but I'm still working on it (hope it's not time critical). I have been testing components in a separate repository and will start reintegrating the changes soon. I will share my W&B reports when everything is ready. This notebook is similar to the one I am currently using for ground truth dynamics.
I have a question about the PETS HalfCheetah environment. What is the reason for adding sine and cosine to the 2nd and 3rd observation dimension?
Hi @freiberg-roman, thanks for the update, and looking forward to seeing the changes, and let me know if there is anything I can do to help :)
With regards to the PETS question, this comes from the original implementation of PETS, and we kept it here mostly for the sake of completeness and for debugging our implementation. If I'm not mistaken, the rationale is that it makes it easier to learn angles, since the model doesn't have to learn something like 0 being close to 2 * pi. Personally, if I were to properly benchmark different MBRL algorithms, I'd do it in the unmodified mujoco-gym versions, and not on custom versions such as this one.
Hi @freiberg-roman. This PR is quite big now :) Are there any sub components that are stable and would be ready to integrate in master? We could make separate PRs for some of them and start adding some of your contributions to master. For example the MPPI optimizer?
Hi @luisenp. Good idea, I wasn't aware of the LOC count. Did a small PR just for one environment #119 . The performance with GT is similar to the notebook version added here.
This PR is quite outdated. I think it would be better to continue with other PR's first :))
(WIP) PDDM implementation
Motivation and Context / Related issue
PR for PDDM's MPPI planner, support for sequenced batches, and in the near future proper settings and benchmarks for MuJoCo environments.
Checklist