facebookresearch / mbrl-lib

Library for Model Based RL
MIT License
959 stars 158 forks source link

[Bug] RuntimeError: CUDA error: device-side assert triggered #173

Closed melfm closed 1 year ago

melfm commented 1 year ago

Steps to reproduce

Just trying to run the example environment with MBPO.

  1. python -m mbrl.examples.main algorithm=mbpo overrides=mbpo_halfcheetah

Observed Results

I noticed this error happens when the flag self.use_only_elite = True and something in the weight multiplication goes wrong.

~/mbrl-lib/mbrl/models/util.py", line 55, in forward
    xw = x.matmul(self.weight[self.elite_models, ...])
RuntimeError: CUDA error: device-side assert triggered

I'm not sure if there is a problem with installation or versioning from my end, but the notebook examples seem to run fine.

luisenp commented 1 year ago

Hi @melfm. Sorry for the delay. I have an internal deadline today and this slipped off my mind. Do you still have this issue? What torch version are you using?

melfm commented 1 year ago

Hi @luisenp - I'm using torch '1.8.1+cu111' and Python 3.8.8. The pets_example.ipynb runs fine though but not when I run the main examples.

luisenp commented 1 year ago

HI @melfm. Confirmed that the error happens to me with those versions of torch and Python. I tried in another Python environment with torch==1.12.1 and Python=3.9.13 and I don't get this error. Is the torch version important for your use case?

melfm commented 1 year ago

Gotcha! The requirements indicate PyTorch (>= 1.7) so I didn't realize 1.8 would be problematic. I will try torch==1.12.1.

luisenp commented 1 year ago

Maybe it's some compatibility issue between the CUDA version and the torch version. The error doesn't happen for me in torch 1.8.1 with CUDA 10.2.

melfm commented 1 year ago

Ok I see! I will explore different versioning for this issue then. Thanks a lot!