daisatojp / mpo

PyTorch Implementation of the Maximum a Posteriori Policy Optimisation
GNU General Public License v3.0
70 stars 19 forks source link

Question: Minimization of dual function #10

Open vinerich opened 3 years ago

vinerich commented 3 years ago

Hey again.

While evaluating MPO I got some strange raise ValueError("x0violates bound constraints."). They originate in this line. However I now implemented a "clamping" with np.max([self.η,1e-6]).

According to their code to check for the bound constraints, this should be totally fine. But I keep getting this error from time to time and the training for the algorithm completely stops as it errors out.

Lines from the corresponding file scipy/optimize/_numdiff.py:

    if np.any((x0 < lb) | (x0 > ub)):
        ("`x0` violates bound constraints.")

Bounds are prepared like this:

def _prepare_bounds(bounds, x0):
    """
    Prepares new-style bounds from a two-tuple specifying the lower and upper
    limits for values in x0. If a value is not bound then the lower/upper bound
    will be expected to be -np.inf/np.inf.

    Examples
    --------
    >>> _prepare_bounds([(0, 1, 2), (1, 2, np.inf)], [0.5, 1.5, 2.5])
    (array([0., 1., 2.]), array([ 1.,  2., inf]))
    """
    lb, ub = [np.asarray(b, dtype=float) for b in bounds]
    if lb.ndim == 0:
        lb = np.resize(lb, x0.shape)

    if ub.ndim == 0:
        ub = np.resize(ub, x0.shape)

    return lb, ub

Any idea to this? Not really an algorithm related question but for me this seems strange.

daisatojp commented 3 years ago

Hmm, it seems this should not happen in latest version of scipy as discussed in https://github.com/scipy/scipy/issues/11403 and https://github.com/scipy/scipy/issues/13277.

Please tell me the followings. What version of scipy and numpy are you using? In my case scipy==1.6.3 and numpy==1.20.3. Does it occur at the LunarLanderContinuous-v2 example? https://github.com/vinerich/mpo

I think if you want to clamp, this should be like np.clip(self.η, -1e-6, 1e-6), not np.max([self.η,1e-6]). I was wrong, np.max([self.η,1e-6]) makes sense.

vinerich commented 3 years ago

scipy=1.6.3 numpy=1.20.2

I looked over to both issues mentioned and I experience the warning mentioned by https://github.com/scipy/scipy/issues/13277 frequently. So it seems working.

But sometimes it still gives me above error. Sadly I can't reproduce this as I don't had the proper logging setup and it only occurs roughly once every ~4 million timesteps.

I will check onto the LunarLanderContinous and let it running for a day or so and report back.