PetarV- / GAT

Graph Attention Networks (https://arxiv.org/abs/1710.10903)
https://petar-v.com/GAT/
MIT License
3.18k stars 642 forks source link

Only 9700 parameters and 13700 samples result in OOM on a TITAN Xp #17

Closed Imorton-zd closed 5 years ago

Imorton-zd commented 5 years ago

----- Opt. hyperparams ----- lr: 0.005 l2_coef: 0.0005 ----- Archi. hyperparams ----- nb. layers: 1 nb. units per layer: [2] nb. attention heads: [2, 1] residual: False nonlinearity: <function elu at 0x7fe02ecfa598> model: <class 'models.gat.GAT'> (13708, 13708) (13708, 600)

Total params: 9,781 Trainable params: 9,781 Non-trainable params: 0


Train on 13708 samples, validate on 13708 samples

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[13708,13708] [[Node: graph_attention_2/leaky_re_lu_3/mul = Mul[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](graph_attention_2/leaky_re_lu_3/Const, graph_attention_2/leaky_re_lu_3/Relu)]] [[Node: loss/add_8/_151 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_2492_loss/add_8", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

PetarV- commented 5 years ago

Hello,

The OOM is most likely caused by your adjacency matrix, which is stored in a dense way. Are you using GAT or SpGAT? SpGAT should support a sparse adjacency matrix and give no issues.

Thanks, Petar