alexfrom0815 / Online-3D-BPP-DRL

This repository contains the implementation of paper Online 3D Bin Packing with Constrained Deep Reinforcement Learning.
292 stars 66 forks source link

When I enabled the rotation, program: "RuntimeError: CUDA error: device-side assert triggered #10

Closed Q-ian closed 2 years ago

Q-ian commented 2 years ago

Hi, alexfrom0815,

When I enabled the rotation , got errors like following. Do you know what caused it?

Traceback (most recent call last):
  File "main.py", line 258, in <module>
    main(args)
  File "main.py", line 42, in main
    train_model()
  File "main.py", line 184, in train_model
    obs, reward, done, infos = envs.step(action)
  File "/mnt/.../baselines/common/vec_env/vec_env.py", line 107, in step
    self.step_async(actions)
  File "/mnt/.../Online-3D-BPP-DRL/acktr/envs.py", line 188, in step_async
    actions = actions.cpu().numpy()
RuntimeError: CUDA error: device-side assert triggered
/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCTensorRandom.cuh:193: void sampleMultinomialOnce(long *, long, int, T *, T *, int, int) [with T = float, AccT = float]: block: [15,0,0], thread: [0,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
 ...
/opt/conda/conda-bld/pytorch_1556653114079/work/aten/src/THC/THCTensorRandom.cuh:193: void sampleMultinomialOnce(long *, long, int, T *, T *, int, int) [with T = float, AccT = float]: block: [15,0,0], thread: [191,0,0] Assertion `THCNumerics<T>::ge(val, zero)` failed.
Xiong5Heng commented 2 years ago

I have the same problem, how did you solve it?

RosCstr commented 2 years ago

Hello. I have a similar problem. Please advise whether I should start or not a new post or not. When I enable rotations the program starts well but then I get the following: ... enable_rotation: True use cuda: True item set: [(2, 2, 2), (2, 2, 3), (2, 2, 4), (2, 2, 5), (2, 3, 2), (2, 3, 3), (2, 3, 4), (2, 3, 5), (2, 4, 2), (2, 4, 3), (2, 4, 4), (2, 4, 5), (2, 5, 2), (2, 5, 3), (2, 5, 4), (2, 5, 5), (3, 2, 2), (3, 2, 3), (3, 2, 4), (3, 2, 5), (3, 3, 2), (3, 3, 3), (3, 3, 4), (3, 3, 5), (3, 4, 2), (3, 4, 3), (3, 4, 4), (3, 4, 5), (3, 5, 2), (3, 5, 3), (3, 5, 4), (3, 5, 5), (4, 2, 2), (4, 2, 3), (4, 2, 4), (4, 2, 5), (4, 3, 2), (4, 3, 3), (4, 3, 4), (4, 3, 5), (4, 4, 2), (4, 4, 3), (4, 4, 4), (4, 4, 5), (4, 5, 2), (4, 5, 3), (4, 5, 4), (4, 5, 5), (5, 2, 2), (5, 2, 3), (5, 2, 4), (5, 2, 5), (5, 3, 2), (5, 3, 3), (5, 3, 4), (5, 3, 5), (5, 4, 2), (5, 4, 3), (5, 4, 4), (5, 4, 5), (5, 5, 2), (5, 5, 3), (5, 5, 4), (5, 5, 5)] Traceback (most recent call last): File "main.py", line 258, in main(args) File "main.py", line 40, in main test_model() File "main.py", line 47, in test_model unified_test(model_url, config) File "unified_test.py", line 40, in unified_test nmodel = nnModel(url, config) File "model_loader.py", line 16, in init self._model = self._load_model(url) File "model_loader.py", line 33, in _load_model actor_critic.load_state_dict(load_dict) File "/home/vlad/anaconda3/envs/3D-BPP-DRL/lib/python3.7/site-packages/torch/nn/modules/module.py", line 777, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for Policy: size mismatch for base.mask.5.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([200, 256]). size mismatch for base.mask.5.bias: copying a param with shape torch.Size([100]) from checkpoint, the shape in current model is torch.Size([200]). size mismatch for dist.linear.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([200, 256]). size mismatch for dist.linear.bias: copying a param with shape torch.Size([100]) from checkpoint, the shape in current model is torch.Size([200]).

If I disable rotations from terminal then everything works well. Please advise related to the subject. Thank you