ZouJiu1 / LSQplus

LSQ+ or LSQplus
57 stars 14 forks source link

Have a problem when using LSQ+_V2 #11

Open lulululichuan opened 1 year ago

lulululichuan commented 1 year ago

Hi author, thank you for your great work! I meet a problem when using Lsq+_V2: Traceback (most recent call last): File "train.py", line 273, in <module> train(args) File "train.py", line 201, in train scaler.scale(loss).backward() File "/home/work/ssd1/anaconda3/envs/py38/lib/python3.8/site-packages/torch/_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/work/ssd1/anaconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Function ALSQPlusBackward returned an invalid gradient at index 2 - got [1] but expected shape compatible with [0] It seems that the backward func of ALSQPlus has some errors, any advice of how to solve this problem? Thanks in advance!

ZouJiu1 commented 1 year ago

index 2 is "[return grad_weight, grad_alpha, None, None, None, grad_beta] " None, it correspond to the "[weight, alpha, g, Qn, Qp, beta]" g, g have no gradient and it gradient should be None.

ZouJiu1 commented 1 year ago

It is the answer about backward. RuntimeError: function ALSQPlusBackward returned a gradient different than None at position

lulululichuan commented 1 year ago

index 2 is "[return grad_weight, grad_alpha, None, None, None, grad_beta] " None, it correspond to the "[weight, alpha, g, Qn, Qp, beta]" g, g have no gradient and it gradient should be None.

Yes, I used the code from your repo , and it shows that the func of ALSQPlus have the "return grad_weight, grad_alpha, None, None, None, grad_beta" originally.

wenpingd commented 1 year ago

Hi author, thank you for your great work! I meet a problem when using Lsq+_V2: Traceback (most recent call last): File "train.py", line 273, in <module> train(args) File "train.py", line 201, in train scaler.scale(loss).backward() File "/home/work/ssd1/anaconda3/envs/py38/lib/python3.8/site-packages/torch/_tensor.py", line 396, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/work/ssd1/anaconda3/envs/py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 173, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: Function ALSQPlusBackward returned an invalid gradient at index 2 - got [1] but expected shape compatible with [0] It seems that the backward func of ALSQPlus has some errors, any advice of how to solve this problem? Thanks in advance!

Hi lulululichuan, have you solved this problem?

MJITG commented 7 months ago

多卡会有这个问题,单卡不会 It occurs when training with multi GPUs and is OK with single GPU