facebookresearch / denoised_mdp

Open source code for paper "Denoised MDPs: Learning World Models Better Than the World Itself"
Other
134 stars 11 forks source link

Denoised MDPs: Learning World Models Better Than The World Itself

Tongzhou Wang, Simon S. Du, Antonio Torralba, Phillip Isola, Amy Zhang, Yuandong Tian

We provide a PyTorch implementation of Denoised MDPs: Learning World Models Better Than The World Itself, published in ICML 2022.

(We also provide a PyTorch implementation of Dreamer that is carefully written and verified to reproduce results. See here for usages.)

The raw real world is noisy. How can reinforcement learning agent successfully learn with such raw data, where signals can be strongly entangled with noises? Denoised MDP characterizes information into four distinct types, based on controllability and relation with rewards, and proposes to extract a state representation space containing only information both controllable and reward-relevant. Under this view, several prior works can be seen as insufficiently removing noisy information.

To properly extract only the useful signal, Denoised MDP considers novel factorized MDP transition structures, where signal representation and noise representation are separated into distinct latent spaces. The state abstraction (i.e., representation learning) problem is turned into a regularized model fitting problem: fitting the factorized forward model to collected trajectories, while requiring the signal latents to be minimally informative of the raw observations.

The resulting variational formulation (derivation in paper) successfully disentangles a variety of noise types (and also noiseless settings), outperforming baseline methods that often can only do well for certain particular noise types.

Visualizations

For environments with distinct types of noises, we visualize latent factorization idenfitied by Denoised MDP, and other baseline methods. Only Denoised MDP successfully disentangle signal from noises across all environments.