DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
8.38k stars 1.61k forks source link

[Bug]: Potential Bug in PPO? Clarification requested #1894

Closed azrael417 closed 2 months ago

azrael417 commented 2 months ago

🐛 Bug

Hello all,

I am sorry in case this is a false alarm, but should the th.min in

https://github.com/DLR-RM/stable-baselines3/blob/5623d98f9d6bcfd2ab450e850c3f7b090aef5642/stable_baselines3/ppo/ppo.py#L231

not be a th.minimum ? th.min behaves differently from th.minimum. I was reimplementing your implementation of the PPO algo in libtorch and the compiler barfed at that point and I think the compiler is right.

Let me know what you think

Best regards Thorsten

To Reproduce

from stable_baselines3 import ...

Relevant log output / Error message

No response

System Info

No response

Checklist

araffin commented 2 months ago

th.min behaves differently from th.minimum.

are you sure?

import torch as th

a = th.ones(2, 4)
b = th.zeros(2, 4)
b[0, 2] = 3

assert th.allclose(th.min(a, b), th.minimum(a,b))

EDIT: if you look at the documentation (https://pytorch.org/docs/stable/generated/torch.min.html), torch.min(input, other, *, out=None) redirects to torch.minimum(). So I think they are actually the same.

azrael417 commented 2 months ago

You are right, in the case where you give it two tensors it seems to be the same. That is good to know. You can close this bug.