Can not change device from cuda:0 to cuda:1

PolarisRisingWar commented 2 years ago

I'm using code like in this: https://github.com/DSE-MSU/DeepRobust/blob/master/examples/graph/test_nettack.py Just added a little changes. And I want to change device from cuda:0 to cuda:1 in line 20 device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu"). It showed me such an error message:

Traceback (most recent call last):
  File "/home/wanghuijuan/whj_code2/aisafety/try3.py", line 49, in <module>
    surrogate.fit(features, adj, labels, idx_train, idx_val, patience=30)
  File "/home/wanghuijuan/anaconda3/envs/cuda102/lib/python3.9/site-packages/deeprobust/graph/defense/gcn.py", line 192, in fit
    self._train_with_early_stopping(labels, idx_train, idx_val, train_iters, patience, verbose)
  File "/home/wanghuijuan/anaconda3/envs/cuda102/lib/python3.9/site-packages/deeprobust/graph/defense/gcn.py", line 261, in _train_with_early_stopping
    output = self.forward(self.features, self.adj_norm)
  File "/home/wanghuijuan/anaconda3/envs/cuda102/lib/python3.9/site-packages/deeprobust/graph/defense/gcn.py", line 126, in forward
    x = self.gc1(x, adj)
  File "/home/wanghuijuan/anaconda3/envs/cuda102/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wanghuijuan/anaconda3/envs/cuda102/lib/python3.9/site-packages/deeprobust/graph/defense/gcn.py", line 40, in forward
    output = torch.spmm(adj, support)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument mat2 in method wrapper__mm)

Honestly I don't know where one of these two tensors has changed to cuda:0..... Is there any place that cuda:0 as device has hardly written?

PolarisRisingWar commented 2 years ago

I find bug in deeprobust/graph/defense/gcn.py line180-186:

        if normalize:
            if utils.is_sparse_tensor(adj):
                adj_norm = utils.normalize_adj_tensor(adj, sparse=True)
            else:
                adj_norm = utils.normalize_adj_tensor(adj)
        else:
            adj_norm = adj

After utils.normalize_adj_tensor(), adj_norm will return to cuda. And in deeprobust/graph/utils.py line202, the code is device = torch.device("cuda" if adj.is_cuda else "cpu"). I think this bug can be dealt by device=adj.device

EdisonLeeeee commented 2 years ago

maybe you should run git pull first, the bug was fixed in 5c85bb69760bef9f37e1284cb136d45d60b8126c

PolarisRisingWar commented 2 years ago

OK, I got it! Honestly it's because I used pip to install this package coz I've met some strange problems when using git setup.py to install it. So I have to mannuly change the files my pip installed to fix this problem.

DSE-MSU / DeepRobust

Can not change device from cuda:0 to cuda:1 #86