setup agent training inside world model

Armandpl / dreamerv3

DreamerV3 + gSDE, using pytorch, on a real robot

1 stars 0 forks source link

setup agent training inside world model #5

Closed Armandpl closed 6 months ago

Armandpl commented 7 months ago

how should I store data in the replay buffer?

assume only one env, insert data into one long sequence, sample random chunks from it, handle dones in the training loop
- how should I handle truncated flags? -> don't count them as done unless time is observable?

Armandpl commented 6 months ago

alright so we have proof of life:

https://github.com/Armandpl/minidream/assets/14967758/b3e1f0cd-c5a5-4193-9cad-3e57908b64b1

but it's unstable (chart from this run):

W B Chart 2_17_2024, 6_42_43 PM

I set up actor critic training on the env itself, bypassing the world model and I can 'train'. It is only slightly more stable but the scale of the losses is wrong. training run here W B Chart 2_17_2024, 6_50_46 PM

Armandpl commented 6 months ago

found and corrected a few bugs in the actor critic code. but it still isn't working so we need to:

[ ] setup nm512 dreamer v3 and train it on cartpole to get a sense of how many steps it takes
[X] why is the sheeprl implem sending a list of distrib as the actor output? why not do it in one neural net pass?
- seems like the nm512 implem isn't doing that
- seems like doing it with one neural net pass works too

Armandpl commented 6 months ago

https://wandb.ai/armandpl/minidream_dev/runs/l6970cbc/workspace?workspace=user-armandpl seems like we can solve cartpole in ~60k steps reliably. seems like a lot of steps? bench against SAC? Against other dreamer implems?