Closed b18arundhati closed 4 months ago
Hi, can you provide me with more details? Which environment is this and did you use the suggested commands? Did you train it to ~100k steps?
Sorry about the issue, this is confirmed to be a bug when cleaning up the code. I will push the fix among a much improved version with my new transformer implementation of df_planning next week, with much better results and much faster speed!
It turns out I need more time for the transformer version release. However, I can also reproduce the result with a simple change in normalization:
First git pull the repo to latest version.
Train 100k steps with
python -m main +name=original_medium_sample20 experiment=exp_planning dataset=maze2d_medium algorithm=df_planning dataset.action_std=[2,2]
Test with by adding
load={wandb_run_id} algorithm.guidance_scale=10 experiment.tasks=[validation]
(The guidance scale can be tuned between 10 to 20)
I also trained a fresh ckpt on my side with this exact command. Download it from google drive to your project root.
Then tar -xzvf medium_a2std_ckpt.tar.gz
to extract and test with python -m main +name=original_medium_sample20 experiment=exp_planning dataset=maze2d_medium algorithm=df_planning dataset.action_std=[2,2] load=outputs/medium_a2std.ckpt algorithm.guidance_scale=20 experiment.tasks=[validation]
For those who come back to this post, I just released the transformer version on main branch. Check it out and here are some visualizations of it
It crashes when guidance_scale is set to non-zero (I have tried setting it to 1 and 10) for the planning experiment. Are there any other parameters that also need to be set for reward-guided planning to work?