HITsz-TMG / PPAT

PPAT: Progressive Graph Pairwise Attention Network for Event Causality Identification
10 stars 1 forks source link

OOM at the second epoch #6

Closed glare-ni closed 4 months ago

glare-ni commented 7 months ago

I use the A6000 GPU as the device to train the model in the ESL dataset, which memory is 48GB. I succeed finished the first epoch with more than 47GB memory, but Out Of Memory at the second epoch. Any advice on how to debug this issue will be greatly appreciated as I am not sure myself where to even begin.

lyj963 commented 4 months ago

Maybe you can significantly reduce the value of “gat_num_heads”(for example, set it to 3) to save a lot of GPU memory and try to run through the entire process first. Then you can observe whether the GPU memory usage increases abnormally as the epoch increases while the code is running. If the code runs smoothly, you can see the maximum GPU memory usage.

Catworkdog commented 1 month ago

为什么ESL dataset会消耗怎么大的内存,而CTB只需要8G左右的内存就能运行

foggy-frost-forest commented 1 month ago

您好

ESL数据集中每条数据是一篇文档, 假如一篇文档包含30个事件, 则会有$3029/2=435$个事件对, 因此图神经网络的attention matrix大小为$435435$; 相比之下, CTB的每条数据是一句话, 事件数可能只有2-4个, 事件对的数目相比ESL很小。

除此之外, 相比CTB仅有一层句内事件的图神经网络, ESL采用了额外3层跨句事件的图神经网络, 因此ESL的实验会消耗更多显存和内存。

您可以参考上面的评论, 将gat_num_heads调小, 适当减少gat_hidden_size或将event_emb_mul设为False, 这些方法都能减少模型的参数量

Catworkdog commented 1 month ago

您好

ESL数据集中每条数据是一篇文档, 假如一篇文档包含30个事件, 则会有$30_29/2=435$个事件对, 因此图神经网络的attention matrix大小为$435_435$; 相比之下, CTB的每条数据是一句话, 事件数可能只有2-4个, 事件对的数目相比ESL很小。

除此之外, 相比CTB仅有一层句内事件的图神经网络, ESL采用了额外3层跨句事件的图神经网络, 因此ESL的实验会消耗更多显存和内存。

您可以参考上面的评论, 将gat_num_heads调小, 适当减少gat_hidden_size或将event_emb_mul设为False, 这些方法都能减少模型的参数量

非常感谢您的回复