QUESTION: fast and slow critic vs. weight EMA.

danijar / dreamerv3

Mastering Diverse Domains through World Models

https://danijar.com/dreamerv3

MIT License

1.28k stars 219 forks source link

QUESTION: fast and slow critic vs. weight EMA. #59

Closed jadkins99 closed 1 year ago

jadkins99 commented 1 year ago

The paper states "We compute λ-returns using the fast critic network and regularize the critic outputs towards those of its own weight EMA instead of computing returns using the slow critic. However, both approaches perform similarly in practice."

I am confused about the definition of fast and slow critics. Is the slow critic similar to a target network? Also, what was used in this repo? The regularizer term in the critic loss seems to involve a slow critic. Can you please explain?

danijar commented 1 year ago

Hi, the slow critic is just an EMA of the weights of of the fast (=normal) critic network. The question is whether the slow critic is used to compute the TD(lambda)-targets (most prior literature) or whether it is used as a separate regularizer (as described in the paper and implemented in the repo).