Why is the posterior being sampled for the policy during inference?

danijar / dreamerv3

Mastering Diverse Domains through World Models

https://danijar.com/dreamerv3

MIT License

1.36k stars 229 forks source link

Why is the posterior being sampled for the policy during inference? #133

Closed sai-prasanna closed 3 months ago

sai-prasanna commented 5 months ago

In dreamerv2 the flag mode=train was passed to the posterior computation in observe to use the modes of the stochastic state. I notice now that we always sample. Is it intended or inconsequential?

https://github.com/danijar/dreamerv3/blob/29eb964e2918a3f4db04086f7f51b60388e97f3d/dreamerv3/agent.py#L129

danijar commented 3 months ago

It's intended because it's easier and still works quite well.