DSE-MSU / DeepRobust

A pytorch adversarial library for attack and defense methods on images and graphs
MIT License
995 stars 192 forks source link

cuda memory cannot empty #90

Open Kidleyh opened 2 years ago

Kidleyh commented 2 years ago

Hello, I used deeprobust 0.2.2 on windows11, everything was fine But yesterday I used deeprobust 0.2.4 on ubuntu 16.04 LTS,the func self.inner_train(self, features, adj_norm, idx_train, idx_unlabeled, labels) cannot empty the cuda memmory except epoch 0,only 200 perturbations the GPU is out of memory . for example , on windows epoch 1 ,the memmory allocated 561737216 before inner_train,and allocated 511405056 after that. but on ubuntu epoch 1 ,the memmory allocated 561737216 before inner_train,and allocated 586903040 after that. I debug one hour but cannot find the error,so i have to issue this Looking forward to your reply, thank you!

Kidleyh commented 2 years ago

and i tried deeprobust==0.2.2 on ubuntu 16.04 LTS ,it has same problem as 0.2.4

ChandlerBang commented 2 years ago

Can you provide more details on this bug? Did you tried examples/test_mettatck.py?

Kidleyh commented 2 years ago

yes,i also tried examples/test_mettack.py , it is also the problem of self.inner_train , and i find this bug again on windows11 when i use torch==1.10.0,so i think maybe it is the problem of torch==1.10.0,but i haven't try torch<1.10.0 on ubuntu 16.04LTS, because when i run pip install deeprobust , your code requires me to install torch==1.10.0

ChandlerBang commented 2 years ago

Hi, I just tried examples/test_mettack.py with torch==1.10.0 and it works fine for me. Can you provide more details on the error information? (by copying the whole error message)

Kidleyh commented 2 years ago

error message may not tell us any useful information, because the error message is about the cuda out of memory due to the self.inner_train could not empty its memmory normally.And I successfully use torch==1.8.0 on ubuntu 16.04 LTS to run examples/test_mettack.py without any question. And when i use torch==1.10.0 on windows11,the error message: RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 6.00 GiB total capacity; 3.42 GiB already allocated; 0 bytes free; 4.43 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF. the torch==1.10.0 on ubuntu 16.04 LTS has been uninstalled

Kidleyh commented 2 years ago

Now i can run the code normally use torch==1.8.0, but I'm afraid the code has some questions with torch=1.10.0. Anyway, i think you can check the self.inner_train in Metaack , and I am very grateful for your work and your help! Thank you !

ChandlerBang commented 2 years ago

Ok, let me see if I can figure it out.

Kidleyh commented 2 years ago

Hi, sorry to bother you again, I also want to know how much cuda memory capacity is enough for training metaApprox to attack pubmed, i have attacked cora,citeseer and polblogs successfully. But when i was attacking pubmed , the error message showed that cuda out of memory, by the way my cuda memmory is 12 GB.

ChandlerBang commented 2 years ago

I am not sure about the detailed gpu memory usage for attacking pubmed but I made it on a GPU with 32GB. The memory complexity of metattack is very high since the search space is quadratic to the number of nodes.

You may turn to some scalable attack instead, e.g., https://github.com/DSE-MSU/DeepRobust/blob/master/deeprobust/graph/targeted_attack/sga.py.

Leirunlin commented 2 years ago

Hi! I've encountered the same problem. Metattack works fine when using the following environment on ubuntu 16.04.12

numpy==1.18.1
scipy==1.6.2
torch==1.8.1
torch_geometric==1.6.3
torch_scatter==2.0.9
torch_sparse==0.6.12

But it shows that CUDA out of memory with the latest version of torch on ubuntu 20.04.1:

deeprobust==0.2.6
numpy==1.23.3
scikit_learn==1.1.3
scipy==1.8.1
torch==1.12.1
torch_geometric==2.1.0
torch_sparse==0.6.15

I also find that the self.inner_train does not empty the gradients. I hope my case will be helpful in solving the problem.