danijar / dreamerv3

Mastering Diverse Domains through World Models
https://danijar.com/dreamerv3
MIT License
1.28k stars 219 forks source link

Symlog for image-based observations #73

Closed giorgiofranceschelli closed 1 year ago

giorgiofranceschelli commented 1 year ago

Hi! First of all: great work, really inspiring.

As far as I understood from the paper, symlog is used to transform the inputs to the world model, images included. In the same way, the decoder is trained via the symlog loss, i.e., we need to symexp transform the decoder's output to get the reconstructed image.

However, I'm struggling to find where and when these transformations happen: MLP applies jaxutils.symlog(feat) if self._symlog_inputs is True, but ImageEncoderResnet does not. In addition, the output distribution for CNN-based MultiDecoder cannot be a SymlogDist as I was expecting.

Have I understood it wrong from the paper, or am I missing something from the code?

schneimo commented 1 year ago

As from my understanding, the paper does not mention anywhere that symlog is applied to image observations. But it is understandable that one might guess so.

You can read quite the opposite in appendix E under paragraph NoObsSymlog: "Because symlog encoding is only used for vector observations, this ablation is equivalent to DreamerV3 on purely image-based environments."

giorgiofranceschelli commented 1 year ago

Yes you are totally right - I was misleaded by the last paragraph in "Symlog Predictions" section, but the appendix E does actually answer my question totally, thank you.

danijar commented 1 year ago

Hi, thanks for pointing out that it's a bit unclear in the paper. Because symlog is basically an identity function near zero it wouldn't really make a difference whether to apply it to pixels that are already in range [0, 1] or not.