GCGL, TWP error when using higher GCN hidden units

QueuQ / CGLB

Other

49 stars 14 forks source link

Traceback (most recent call last): File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main AP, AF, acc_matrix,cls_matrix = main(args, valid=True) File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class train_func(train_loader, loss_criterion, tid, args) File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL eloss.backward() File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward torch.autograd.backward( File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

When the GCN hidden unit size is larger than the input size, TWP will throw the following error:
Traceback (most recent call last):
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main
    AP, AF, acc_matrix,cls_matrix = main(args, valid=True)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class
    train_func(train_loader, loss_criterion, tid, args)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL
    eloss.backward()
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
because of the following code snippet:

https://github.com/QueuQ/CGLB/blob/793a346b2fe867087e76f9d553bf6d8a0afad6d8/GCGL/Backbones/graphconv.py#L155-L188

When hidden unit size is bigger than the input size, the if statement is false, causing the else statement to be executed. However, in the else statement, for the first layer of GCN, the edge weight is computed without interacting with the trainable weight, making it unoptimizable, hence the error.

Thanks a lot for pointing out this bug. I have now invalidate the else statement temporarily to avoid this issue. I tested it and it should be able to run now.

QueuQ / CGLB

GCGL, TWP error when using higher GCN hidden units #16