Armandpl / dreamerv3

DreamerV3 + gSDE, using pytorch, on a real robot
1 stars 0 forks source link

setup agent training inside world model #5

Closed Armandpl closed 6 months ago

Armandpl commented 7 months ago

how should I store data in the replay buffer?

Armandpl commented 6 months ago

alright so we have proof of life:

https://github.com/Armandpl/minidream/assets/14967758/b3e1f0cd-c5a5-4193-9cad-3e57908b64b1

but it's unstable (chart from this run):

W B Chart 2_17_2024, 6_42_43 PM

I set up actor critic training on the env itself, bypassing the world model and I can 'train'. It is only slightly more stable but the scale of the losses is wrong. training run here W B Chart 2_17_2024, 6_50_46 PM

Armandpl commented 6 months ago

found and corrected a few bugs in the actor critic code. but it still isn't working so we need to:

Armandpl commented 6 months ago

https://wandb.ai/armandpl/minidream_dev/runs/l6970cbc/workspace?workspace=user-armandpl seems like we can solve cartpole in ~60k steps reliably. seems like a lot of steps? bench against SAC? Against other dreamer implems?