QueuQ / CGLB

Other
49 stars 14 forks source link

GCGL, TWP error when using higher GCN hidden units #16

Closed WeiWeic6222848 closed 1 year ago

WeiWeic6222848 commented 1 year ago

When the GCN hidden unit size is larger than the input size, TWP will throw the following error:

Traceback (most recent call last):
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main
    AP, AF, acc_matrix,cls_matrix = main(args, valid=True)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class
    train_func(train_loader, loss_criterion, tid, args)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL
    eloss.backward()
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

because of the following code snippet: https://github.com/QueuQ/CGLB/blob/793a346b2fe867087e76f9d553bf6d8a0afad6d8/GCGL/Backbones/graphconv.py#L155-L188

When hidden unit size is bigger than the input size, the if statement is false, causing the else statement to be executed. However, in the else statement, for the first layer of GCN, the edge weight is computed without interacting with the trainable weight, making it unoptimizable, hence the error.

QueuQ commented 1 year ago

When the GCN hidden unit size is larger than the input size, TWP will throw the following error:

Traceback (most recent call last):
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/train.py", line 141, in main
    AP, AF, acc_matrix,cls_matrix = main(args, valid=True)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/GCGL/pipeline.py", line 529, in pipeline_multi_class
    train_func(train_loader, loss_criterion, tid, args)
  File "/mnt/c/Users/Wei_Wei/PycharmProjects/CGLB/./GCGL/Baselines/twp_model.py", line 188, in observe_clsIL
    eloss.backward()
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/home/wwei/miniconda3/envs/GNN-DL-py38/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

because of the following code snippet:

https://github.com/QueuQ/CGLB/blob/793a346b2fe867087e76f9d553bf6d8a0afad6d8/GCGL/Backbones/graphconv.py#L155-L188

When hidden unit size is bigger than the input size, the if statement is false, causing the else statement to be executed. However, in the else statement, for the first layer of GCN, the edge weight is computed without interacting with the trainable weight, making it unoptimizable, hence the error.

Thanks a lot for pointing out this bug. I have now invalidate the else statement temporarily to avoid this issue. I tested it and it should be able to run now.