daisatojp / mpo

PyTorch Implementation of the Maximum a Posteriori Policy Optimisation
GNU General Public License v3.0
70 stars 19 forks source link

negative determinant for Σ #11

Closed nilsplettenberg closed 3 years ago

nilsplettenberg commented 3 years ago

Hi, first of all, thanks for your awesome implementation, it really helped me understanding MPO much better!

I tried to use the model for a custom environment, which sometimes works fine, but quite often i get an error when computing the KL divergence: for some reason, the determinants of Σ and Σi get negative for some samples, resulting in nans in line 47 of mpo.py because the argument of the log is negative: inner_Σ = torch.log(Σ.det() / Σi.det()) - n + btr(Σ_inv @ Σi) # (B,) The output of the actor looks alright to me, it is a diagonal matrix with positive entries on the diagonal axis, as described in the paper. As far as I understand, the covariance Σ should be positive definite when computed with the Cholesky output from the actor. I'm not too sure about this because it's been some time since I had my last math lecture, but according to a quick Google search, the determinant of a positive definite matrix should be positive. So i don't understand how the determinant of Σ or Σi get negative. Do you have any idea how this could happen? The state that is fed to the actor looks alright, no nans or anything similar, just normal floats.

nilsplettenberg commented 3 years ago

nevermind, I solved it by normalizing the states

daisatojp commented 3 years ago

Hi @nilsplettenberg

I found the determinant can be minus due to numerical calculation error. For example

import torch
A = torch.tensor([[ 3.0744e+00,  0.0000e+00], [-1.1290e+01,  5.5806e-04]], dtype=torch.float32)
C = A @ A.transpose(0, 1)
C.det()  # tensor(-5.9274e-05)

In the case of numpy,

import numpy as np
A = np.array([[ 3.0744e+00,  0.0000e+00], [-1.1290e+01,  5.5806e-04]], dtype=np.float32)
C = A @ A.transpose()
np.linalg.det(C)  # -0.0001203164

I clamp the determinants. https://github.com/daisatojp/mpo/commit/e349020c589e3fbd617d711e0f902b111020c649#diff-6308fe3947ec88b84d6ac4435356797afca76d921fa6092b2f454082501a88d6R47

Thanks!

nilsplettenberg commented 3 years ago

Interesting, I've never checked whether to computation of the determinant was in fact correct because I thought it's a rather simple operation that couldn't go wrong. I always assumed the problem was somewhere else. Thanks for the update and fix!