CarloLucibello / GraphNeuralNetworks.jl

Graph Neural Networks in Julia
https://carlolucibello.github.io/GraphNeuralNetworks.jl/dev/
MIT License
216 stars 46 forks source link

fixing overflow in GAT layers #244

Closed pevnak closed 1 year ago

pevnak commented 1 year ago

I have found that attention in GAT can overflow, as attention layers likes to due to the presence of exp which is not checked for maximum. I propose a quick and likely very dirty hack. What I do not like on the solution is that it done in function message and works only because there is a subsequent division in other function. I run full test suit and seems to work.

codecov[bot] commented 1 year ago

Codecov Report

Merging #244 (316530b) into master (1381a78) will decrease coverage by 7.56%. The diff coverage is 17.94%.

@@            Coverage Diff             @@
##           master     #244      +/-   ##
==========================================
- Coverage   88.29%   80.72%   -7.57%     
==========================================
  Files          16       17       +1     
  Lines        1597     1733     +136     
==========================================
- Hits         1410     1399      -11     
- Misses        187      334     +147     
Impacted Files Coverage Δ
src/GNNGraphs/gnnheterograph.jl 23.94% <0.00%> (-57.88%) :arrow_down:
src/layers/conv.jl 76.88% <100.00%> (+0.10%) :arrow_up:
src/utils.jl 100.00% <100.00%> (ø)
src/GNNGraphs/gnngraph.jl 55.11% <0.00%> (-44.07%) :arrow_down:
src/GNNGraphs/utils.jl 75.40% <0.00%> (-16.34%) :arrow_down:
src/GNNGraphs/gatherscatter.jl 75.00% <0.00%> (-5.00%) :arrow_down:
src/GNNGraphs/datastore.jl 72.72% <0.00%> (ø)
src/GNNGraphs/transform.jl 96.55% <0.00%> (+0.43%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

CarloLucibello commented 1 year ago

A problem of the maximum trick in this setting is that the maximum is computed over all edges in the graph, while the softmax is over the neighborhood. I fear it could cause underflow problems instead of overflow ones.

A robust solution involves softmax_edges_neighborhood, which is also what PyG uses in GAT layers. That function should also be modified to add an eps to the denominator.

pevnak commented 1 year ago

I see,

but we agree that the implementation of GATConv has overflow issue and should be fixed, right?

pevnak commented 1 year ago

A new attempt to fix it, though I had to write propagate to the (l::GATConv)

CarloLucibello commented 1 year ago

Yes it should be fixed, but likely we will lose a factor close to two in performance

pevnak commented 1 year ago

I know, but do not know what to do with this. Seems like it is done either fast or correctly. Do you have any idea, how to deal with it?

CarloLucibello commented 1 year ago

let's be safe, it's also what other frameworks do. Can you apply the same pattern to GATv2Conv?