AssertionError - Githubissues

JSA-458 commented 2 years ago

Dear author: Thanks for your sharing! I am really interested in your work! When I type "python main.py --no-cuda ", I get that error after hours of training.

Time version: 3-2022.03.19-12-45-46 is training Updates 15320, num timesteps 4902720, FPS 225 Last 10 training episodes: mean/median reward 7.9/8.0, min/max reward 7.1/8.4 The dist entropy 0.68102, the value loss 0.59049, the action loss 0.06197 The mean space ratio is 0.7919, the ratio threshold is0.946

Traceback (most recent call last): File "main.py", line 61, in main(args) File "main.py", line 56, in main trainTool.train_n_steps(envs, args, device) File "/home/Online-3D-BPP-PCT/train_tools.py", line 66, in train_n_steps selectedlogProb, selectedIdx, distentropy, = self.PCT_policy(all_nodes, normFactor = factor) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/Online-3D-BPP-PCT/model.py", line 22, in forward o, p, distentropy, hidden, = self.actor(items, deterministic, normFactor = normFactor, evaluate = evaluate) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/Online-3D-BPP-PCT/attention_model.py", line 129, in forward valid_length = valid_length) File "/home/Online-3D-BPP-PCT/attention_model.py", line 136, in _inner log_p, mask = self._get_log_p(fixed, mask) File "/home/Online-3D-BPP-PCT/attention_model.py", line 198, in _get_log_p assert not torch.isnan(log_p).any() AssertionError**

Python == 3.7.7, torch == 1.10.1, OS: Ubuntu 18.04

alexfrom0815 commented 2 years ago

Dear author: Thanks for your sharing! I am really interested in your work! When I type "python main.py --no-cuda ", I get that error after hours of training.

Time version: 3-2022.03.19-12-45-46 is training Updates 15320, num timesteps 4902720, FPS 225 Last 10 training episodes: mean/median reward 7.9/8.0, min/max reward 7.1/8.4 The dist entropy 0.68102, the value loss 0.59049, the action loss 0.06197 The mean space ratio is 0.7919, the ratio threshold is0.946

Traceback (most recent call last): File "main.py", line 61, in main(args) File "main.py", line 56, in main trainTool.train_n_steps(envs, args, device) File "/home/Online-3D-BPP-PCT/train_tools.py", line 66, in train_n_steps selectedlogProb, selectedIdx, distentropy, = self.PCT_policy(all_nodes, normFactor = factor) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/Online-3D-BPP-PCT/model.py", line 22, in forward o, p, distentropy, hidden, = self.actor(items, deterministic, normFactor = normFactor, evaluate = evaluate) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/home/Online-3D-BPP-PCT/attention_model.py", line 129, in forward valid_length = valid_length) File "/home/Online-3D-BPP-PCT/attention_model.py", line 136, in _inner log_p, mask = self._get_log_p(fixed, mask) File "/home/Online-3D-BPP-PCT/attention_model.py", line 198, in _get_log_p assert not torch.isnan(log_p).any() AssertionError**

Python == 3.7.7, torch == 1.10.1, OS: Ubuntu 18.04

I did the same experiment, but I haven't yet triggered this bug, I'm still trying to identify the source of the bug. A simple suggestion is that maybe you can change the random seed and maybe this bug won't happen again.

JSA-458 commented 2 years ago

Thank you very much for your sharing, I would like to consult you a question, how does the model judge whether the training is complete during the training process after executing ''python main.py ''? At present, my updates have reached more than 400,000 times, and I have trained in the cpu for 5 days. I want to know how many times the updates will end the training. In addition, whether this training process uses cuda, the training time will be greatly shortened, because I saw in your 2021 article that the training time is 16h. Thank you very much for your reply, thanks in advance

------------------ 原始邮件 ------------------ 发件人: "alexfrom0815/Online-3D-BPP-PCT" @.>; 发送时间: 2022年3月21日(星期一) 下午4:41 @.>; @.**@.>; 主题: Re: [alexfrom0815/Online-3D-BPP-PCT] AssertionError (Issue #6)

Dear author: Thanks for your sharing! I am really interested in your work! When I type "python main.py --no-cuda ", I get that error after hours of training.

Time version: 3-2022.03.19-12-45-46 is training Updates 15320, num timesteps 4902720, FPS 225 Last 10 training episodes: mean/median reward 7.9/8.0, min/max reward 7.1/8.4 The dist entropy 0.68102, the value loss 0.59049, the action loss 0.06197 The mean space ratio is 0.7919, the ratio threshold is0.946

Traceback (most recent call last): File "main.py", line 61, in main(args) File "main.py", line 56, in main trainTool.train_n_steps(envs, args, device) File "/home/Online-3D-BPP-PCT/train_tools.py", line 66, in train_n_steps selectedlogProb, selectedIdx, distentropy, = self.PCT_policy(all_nodes, normFactor = factor) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/Online-3D-BPP-PCT/model.py", line 22, in forward o, p, distentropy, hidden, = self.actor(items, deterministic, normFactor = normFactor, evaluate = evaluate) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/home/Online-3D-BPP-PCT/attention_model.py", line 129, in forward valid_length = valid_length) File "/home/Online-3D-BPP-PCT/attention_model.py", line 136, in _inner log_p, mask = self._get_log_p(fixed, mask) File "/home/Online-3D-BPP-PCT/attention_model.py", line 198, in _get_log_p assert not torch.isnan(log_p).any() AssertionError

Python == 3.7.7, torch == 1.10.1, OS: Ubuntu 18.04

I did the same experiment, but I haven't yet triggered this bug, I'm still trying to identify the source of the bug. A simple suggestion is that maybe you can change the random seed and maybe this bug won't happen again.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

alexfrom0815 commented 2 years ago

Thank you very much for your sharing, I would like to consult you a question, how does the model judge whether the training is complete during the training process after executing ''python main.py ''? At present, my updates have reached more than 400,000 times, and I have trained in the cpu for 5 days. I want to know how many times the updates will end the training. In addition, whether this training process uses cuda, the training time will be greatly shortened, because I saw in your 2021 article that the training time is 16h. Thank you very much for your reply, thanks in advance

Hello, I currently do not explicitly set the termination condition of this program. I am generally used to observing the training curve on tensorboard (saved in 'logs/runs'). When the reward curve no longer has an upward trend, I will terminate the running of the program. Best wishes! :)

chefwang-cloid commented 2 years ago

Dear author: I would like to consult you a question, how to generate 3D boxing figure in Figure 8 in the appendix of the paper ?

alexfrom0815 commented 2 years ago

Dear author: I would like to consult you a question, how to generate 3D boxing figure in Figure 8 in the appendix of the paper ?

We first generate the mesh data of the box, and then use the graphics software (deep exploration here) to render the mesh.

chefwang-cloid commented 2 years ago

Thank you very much I wil try it.

JSA-458 commented 2 years ago

Thank you very much for your sharing, I would like to consult you a question:how does the model judge whether the training is complete during the training process after executing ''python main.py ''? At present, my updates have reached more than 400,000 times, and I have trained in the cpu for 5 days. I want to know how many times the updates will end the training. In addition, whether this training process uses cuda, the training time will be greatly shortened, because I saw in your 2021 article that the training time is 16h. Thank you very much for your reply, thanks in advance

------------------ 原始邮件 ------------------ 发件人: "alexfrom0815/Online-3D-BPP-PCT" @.>; 发送时间: 2022年3月21日(星期一) 下午4:41 @.>; @.**@.>; 主题: Re: [alexfrom0815/Online-3D-BPP-PCT] AssertionError (Issue #6)

Dear author: Thanks for your sharing! I am really interested in your work! When I type "python main.py --no-cuda ", I get that error after hours of training.

Time version: 3-2022.03.19-12-45-46 is training Updates 15320, num timesteps 4902720, FPS 225 Last 10 training episodes: mean/median reward 7.9/8.0, min/max reward 7.1/8.4 The dist entropy 0.68102, the value loss 0.59049, the action loss 0.06197 The mean space ratio is 0.7919, the ratio threshold is0.946

Traceback (most recent call last): File "main.py", line 61, in main(args) File "main.py", line 56, in main trainTool.train_n_steps(envs, args, device) File "/home/Online-3D-BPP-PCT/train_tools.py", line 66, in train_n_steps selectedlogProb, selectedIdx, distentropy, = self.PCT_policy(all_nodes, normFactor = factor) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/home/Online-3D-BPP-PCT/model.py", line 22, in forward o, p, distentropy, hidden, = self.actor(items, deterministic, normFactor = normFactor, evaluate = evaluate) File "/home/.conda/envs/Online3D/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, **kwargs) File "/home/Online-3D-BPP-PCT/attention_model.py", line 129, in forward valid_length = valid_length) File "/home/Online-3D-BPP-PCT/attention_model.py", line 136, in _inner log_p, mask = self._get_log_p(fixed, mask) File "/home/Online-3D-BPP-PCT/attention_model.py", line 198, in _get_log_p assert not torch.isnan(log_p).any() AssertionError

Python == 3.7.7, torch == 1.10.1, OS: Ubuntu 18.04

I did the same experiment, but I haven't yet triggered this bug, I'm still trying to identify the source of the bug. A simple suggestion is that maybe you can change the random seed and maybe this bug won't happen again.

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

alexfrom0815 commented 1 year ago

Yes, CURA is necessary for our DRL training. We didn't set a fixed time to terminate the training, I am used to observing the reward curve, when it no longer has a significant increase, I will manually terminate the training procedure.

alexfrom0815 / Online-3D-BPP-PCT

AssertionError #6