Closed nilsplettenberg closed 3 years ago
nevermind, I solved it by normalizing the states
Hi @nilsplettenberg
I found the determinant can be minus due to numerical calculation error. For example
import torch
A = torch.tensor([[ 3.0744e+00, 0.0000e+00], [-1.1290e+01, 5.5806e-04]], dtype=torch.float32)
C = A @ A.transpose(0, 1)
C.det() # tensor(-5.9274e-05)
In the case of numpy,
import numpy as np
A = np.array([[ 3.0744e+00, 0.0000e+00], [-1.1290e+01, 5.5806e-04]], dtype=np.float32)
C = A @ A.transpose()
np.linalg.det(C) # -0.0001203164
I clamp the determinants. https://github.com/daisatojp/mpo/commit/e349020c589e3fbd617d711e0f902b111020c649#diff-6308fe3947ec88b84d6ac4435356797afca76d921fa6092b2f454082501a88d6R47
Thanks!
Interesting, I've never checked whether to computation of the determinant was in fact correct because I thought it's a rather simple operation that couldn't go wrong. I always assumed the problem was somewhere else. Thanks for the update and fix!
Hi, first of all, thanks for your awesome implementation, it really helped me understanding MPO much better!
I tried to use the model for a custom environment, which sometimes works fine, but quite often i get an error when computing the KL divergence: for some reason, the determinants of Σ and Σi get negative for some samples, resulting in nans in line 47 of mpo.py because the argument of the log is negative:
inner_Σ = torch.log(Σ.det() / Σi.det()) - n + btr(Σ_inv @ Σi) # (B,)
The output of the actor looks alright to me, it is a diagonal matrix with positive entries on the diagonal axis, as described in the paper. As far as I understand, the covariance Σ should be positive definite when computed with the Cholesky output from the actor. I'm not too sure about this because it's been some time since I had my last math lecture, but according to a quick Google search, the determinant of a positive definite matrix should be positive. So i don't understand how the determinant of Σ or Σi get negative. Do you have any idea how this could happen? The state that is fed to the actor looks alright, no nans or anything similar, just normal floats.