RingBDStack / SUGAR

Code for "SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism"
53 stars 6 forks source link

Gradient explosion and memory error in DD,NCI1,NCI109 #2

Closed QiyaoHuang closed 2 years ago

QiyaoHuang commented 2 years ago
  1. In DD: Traceback (most recent call last): File "transform.py", line 443, in adjs, d_es = paser.main(save=True) File "transform.py", line 153, in main d_es, adj_com = self.ex_edges() File "transform.py", line 87, in ex_edges adj = np.zeros((self.n, self.n)) numpy.core._exceptions.MemoryError: Unable to allocate 112. GiB for an array with shape (122494, 122494) and data type float64 2.In NCI1 and NCI109: folds 1/10: 0%| | 1/1000 [04:11<41:05:52, 148.10s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
    folds 1/10: 0%| | 2/1000 [04:11<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0] folds 1/10: 0%| | 2/1000 [05:52<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0] folds 1/10: 0%| | 2/1000 [05:52<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0] folds 1/10: 0%| | 3/1000 [05:52<31:02:52, 112.11s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
Suchun-sv commented 2 years ago
  1. It did consumed high memory, thank you for the warning, we are about to pakage the binary DD output and release here.

  2. High learning rate, version of env, bad init random value or some other reason may cause this. I tried NCI1 and NCI109 just now, work normlly in my machine. Maybe you can try other parameter setting, adding middle output, using gradient clipping.

Suchun-sv commented 2 years ago

https://pan.baidu.com/s/1rdYhypHCebxBknMrIzAubg password: eqlo Note that the feature and the subgraph has been split into the folder "features" and "subadj" to fit your memory problem, you can just change the code to run (you don't need load them all once, load specific graph when you need in train progress)