benedekrozemberczki / pytorch_geometric_temporal

PyTorch Geometric Temporal: Spatiotemporal Signal Processing with Neural Machine Learning Models (CIKM 2021)
MIT License
2.61k stars 367 forks source link

TSAGCN test failed on GPU #186

Closed BlueSkyLT closed 1 year ago

BlueSkyLT commented 2 years ago

When running the test, all tests passed except for

test/attention_test.py:715:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py:1102: in _call_impl                                                                                                             
    return forward_call(*input, **kwargs)
torch_geometric_temporal/nn/attention/tsagcn.py:339: in forward
    y = self.relu(self.tcn1(self.gcn1(x)) + self.residual(x))
../../../anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py:1102: in _call_impl                                                                                                             
    return forward_call(*input, **kwargs)
torch_geometric_temporal/nn/attention/tsagcn.py:262: in forward                                                                                                                                                        
    y = self._non_adaptive_forward(x, y)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = UnitGCN(                                                                                                                                                                                                      (conv_d): ModuleList(
    (0): Conv2d(100, 10, kernel_size=(1, 1), stride=(1, 1))                                                                                                                                                            (1): Conv2d(100, 10, ...ack_running_stats=True)
  (soft): Softmax(dim=-2)
  (tan): Tanh()
  (sigmoid): Sigmoid()
  (relu): ReLU(inplace=True)
)
x = tensor([[[[ 0.3476,  0.1290,  0.4463,  ...,  0.4613,  0.2014,  0.0761],                                                                                                                                                  [ 1.3476,  1.1290,  1.4463,  ...,  1...,  2.4257,  2.1628],                                                                                                                                                        [ 3.6332,  4.1489,  4.0730,  ...,  4.4859,  3.4257,  3.1628]]]],                                                                                                                                                device='cuda:0')
y = None

    def _non_adaptive_forward(self, x, y):
        N, C, T, V = x.size()
        for i in range(self.num_subset):
            A1 = self.A[i]
            A2 = x.view(N, C * T, V)
>           z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V))
E           RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_mm)

torch_geometric_temporal/nn/attention/tsagcn.py:251: RuntimeError

Similar to #46 , if I force it on CPU then it will pass

   #device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    device = torch.device("cpu")
benedekrozemberczki commented 2 years ago

I suppose there is a tensor that is not on GPU, can you guess which one?

cshjin commented 10 months ago

The issue still existed.

    def _non_adaptive_forward(self, x, y):
        N, C, T, V = x.size()
        for i in range(self.num_subset):
            A1 = self.A[i]
            A2 = x.view(N, C * T, V)
>           z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V))
# A1 is on CPU, A2 is on GPU (same as `x`)

A temp fix could be

    def _non_adaptive_forward(self, x, y):
        _device = x.device
        N, C, T, V = x.size()
        for i in range(self.num_subset):
            A1 = self.A[i].to(_device)
            A2 = x.view(N, C * T, V)
            z = self.conv_d[i](torch.matmul(A2, A1).view(N, C, T, V))