CarloLucibello / GraphNeuralNetworks.jl

Graph Neural Networks in Julia
https://carlolucibello.github.io/GraphNeuralNetworks.jl/dev/
MIT License
215 stars 46 forks source link

GPU memory filling up #150

Closed casper2002casper closed 2 years ago

casper2002casper commented 2 years ago

When repeatedly performing inference operations, the GPU memory gets filled up quite fast. This causes the GPU to have to perform garbage collection. In my implementation, this accounted for 50% of GPU time as GC was needed every couple seconds. Using normal NN the GPU memory increases much slower.

using Flux
using GraphNeuralNetworks
using CUDA

N = 1000
M = 10
I = 10

function test_mem(n, make_data)
    for i in 1:I
        g = make_data()
        b_g = Flux.batch([g for i in 1:M]) |> gpu
        x = n(b_g)
        CUDA.memory_status()
    end
end

println("GNN:")
make_data() = GNNGraph(collect(1:N-1), collect(2:N), num_nodes = N, ndata = rand(1, N))
n = GNNChain(Dense(1, 1000), Dense(1000, 1)) |> gpu
CUDA.@time test_mem(n, make_data)

println("NN:")
make_data() = rand(3*N,1)
n = Chain(Dense(3*N, 1000), Dense(1000, 1)) |> gpu
CUDA.@time test_mem(n, make_data)
GNN:
Effective GPU memory usage: 59.74% (1.169 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 63.61% (1.245 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 67.43% (1.320 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 71.24% (1.395 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 75.11% (1.470 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 78.93% (1.545 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 82.74% (1.620 GiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 36.35% (728.562 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 40.16% (805.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 43.98% (881.562 MiB/1.958 GiB)
No memory pool is in use.  0.217427 seconds (74.57 k CPU allocations: 23.878 MiB, 6.62% gc time) (80 GPU allocations: 766.371 MiB, 42.74% memmgmt time)
NN:
Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.Effective GPU memory usage: 44.50% (892.062 MiB/1.958 GiB)
No memory pool is in use.  0.057799 seconds (60.44 k CPU allocations: 6.917 MiB) (50 GPU allocations: 1.908 MiB, 1.53% memmgmt time)
casper2002casper commented 2 years ago

Any update on this? It seems like a major issue to me as it slows down any form of inference significantly.

CarloLucibello commented 2 years ago

I tried to reduce the example, it turns out that the problem is not GNN.jl related but due to the allocation strategy of CUDA.jl (of which I know nothing about). In your original example the comparison wasn't measuring comparable operations for GNN and NN.

Here are a few comparisons

using GraphNeuralNetworks, CUDA, Flux

N = 10000
I = 10

function test_mem(n, data)
    for i in 1:I
        y = n(data)
        CUDA.memory_status()
    end
end

GC.gc(); CUDA.reclaim();
println("GNN, memory filling")
g = GNNGraph(collect(1:N-1), collect(2:N), num_nodes = N, ndata = rand(Float32, 1, N)) |> gpu
gnnchain = GNNChain(Dense(1, 1000), Dense(1000, 1)) |> gpu
CUDA.@time test_mem(gnnchain, g)

GC.gc(); CUDA.reclaim();
println("\n\nNN equivalent to GNN, memory filling")
x = g.ndata.x
chain = Chain(gnnchain.layers...)
@assert gnnchain(g).ndata.x ≈ chain(x) 
## same results with these
# x = rand(1, N) |> gpu
# chain = Chain(Dense(1, 1000), Dense(1000, 1)) |> gpu
CUDA.@time test_mem(chain, x)

GC.gc(); CUDA.reclaim();
println("\n\nNN 1,  same memory")
data = rand(N, 1) |> gpu
n = Chain(Dense(N, 1000), Dense(1000, 1)) |> gpu
CUDA.@time test_mem(n, data)

println("\n\nNN 2, memory filling:")
data = rand(N, N) |> gpu
n = Chain(Dense(N, 1000), Dense(1000, 1)) |> gpu
CUDA.@time test_mem(n, data)
julia> include("bench.jl")
GNN, memory filling
Effective GPU memory usage: 21.33% (834.750 MiB/3.823 GiB)
Memory pool usage: 496.399 MiB (544.000 MiB reserved)Effective GPU memory usage: 22.96% (898.750 MiB/3.823 GiB)
Memory pool usage: 572.769 MiB (608.000 MiB reserved)Effective GPU memory usage: 25.41% (994.750 MiB/3.823 GiB)
Memory pool usage: 649.139 MiB (704.000 MiB reserved)Effective GPU memory usage: 27.05% (1.034 GiB/3.823 GiB)
Memory pool usage: 725.510 MiB (768.000 MiB reserved)Effective GPU memory usage: 29.50% (1.128 GiB/3.823 GiB)
Memory pool usage: 801.880 MiB (864.000 MiB reserved)Effective GPU memory usage: 30.32% (1.159 GiB/3.823 GiB)
Memory pool usage: 878.250 MiB (896.000 MiB reserved)Effective GPU memory usage: 31.95% (1.221 GiB/3.823 GiB)
Memory pool usage: 954.620 MiB (960.000 MiB reserved)Effective GPU memory usage: 34.41% (1.315 GiB/3.823 GiB)
Memory pool usage: 1.007 GiB (1.031 GiB reserved)Effective GPU memory usage: 36.04% (1.378 GiB/3.823 GiB)
Memory pool usage: 1.081 GiB (1.094 GiB reserved)Effective GPU memory usage: 37.68% (1.440 GiB/3.823 GiB)
Memory pool usage: 1.156 GiB (1.156 GiB reserved)  0.028500 seconds (9.53 k CPU allocations: 522.793 KiB) (40 GPU allocations: 763.702 MiB, 10.16% memmgmt time)

NN equivalent to GNN, memory filling
Effective GPU memory usage: 25.41% (994.750 MiB/3.823 GiB)
Memory pool usage: 649.025 MiB (704.000 MiB reserved)Effective GPU memory usage: 27.05% (1.034 GiB/3.823 GiB)
Memory pool usage: 725.395 MiB (768.000 MiB reserved)Effective GPU memory usage: 29.50% (1.128 GiB/3.823 GiB)
Memory pool usage: 801.765 MiB (864.000 MiB reserved)Effective GPU memory usage: 30.32% (1.159 GiB/3.823 GiB)
Memory pool usage: 878.136 MiB (896.000 MiB reserved)Effective GPU memory usage: 31.95% (1.221 GiB/3.823 GiB)
Memory pool usage: 954.506 MiB (960.000 MiB reserved)Effective GPU memory usage: 34.41% (1.315 GiB/3.823 GiB)
Memory pool usage: 1.007 GiB (1.031 GiB reserved)Effective GPU memory usage: 36.04% (1.378 GiB/3.823 GiB)
Memory pool usage: 1.081 GiB (1.094 GiB reserved)Effective GPU memory usage: 37.68% (1.440 GiB/3.823 GiB)
Memory pool usage: 1.156 GiB (1.156 GiB reserved)Effective GPU memory usage: 40.13% (1.534 GiB/3.823 GiB)
Memory pool usage: 1.230 GiB (1.250 GiB reserved)Effective GPU memory usage: 41.76% (1.596 GiB/3.823 GiB)
Memory pool usage: 1.305 GiB (1.312 GiB reserved)  0.027846 seconds (9.17 k CPU allocations: 508.121 KiB) (40 GPU allocations: 763.702 MiB, 8.10% memmgmt time)

NN 1,  same memory
Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.027 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.035 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.042 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.050 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.057 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.065 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.073 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.080 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.088 MiB (512.000 MiB reserved)Effective GPU memory usage: 20.51% (802.750 MiB/3.823 GiB)
Memory pool usage: 458.096 MiB (512.000 MiB reserved)  0.005307 seconds (1.64 k CPU allocations: 86.688 KiB) (40 GPU allocations: 78.203 KiB, 1.58% memmgmt time)

NN 2, memory filling:
Effective GPU memory usage: 31.95% (1.221 GiB/3.823 GiB)
Memory pool usage: 954.014 MiB (960.000 MiB reserved)Effective GPU memory usage: 34.41% (1.315 GiB/3.823 GiB)
Memory pool usage: 1.006 GiB (1.031 GiB reserved)Effective GPU memory usage: 36.04% (1.378 GiB/3.823 GiB)
Memory pool usage: 1.081 GiB (1.094 GiB reserved)Effective GPU memory usage: 37.68% (1.440 GiB/3.823 GiB)
Memory pool usage: 1.155 GiB (1.156 GiB reserved)Effective GPU memory usage: 40.13% (1.534 GiB/3.823 GiB)
Memory pool usage: 1.230 GiB (1.250 GiB reserved)Effective GPU memory usage: 41.76% (1.596 GiB/3.823 GiB)
Memory pool usage: 1.305 GiB (1.312 GiB reserved)Effective GPU memory usage: 44.22% (1.690 GiB/3.823 GiB)
Memory pool usage: 1.379 GiB (1.406 GiB reserved)Effective GPU memory usage: 45.85% (1.753 GiB/3.823 GiB)
Memory pool usage: 1.454 GiB (1.469 GiB reserved)Effective GPU memory usage: 47.49% (1.815 GiB/3.823 GiB)
Memory pool usage: 1.528 GiB (1.531 GiB reserved)Effective GPU memory usage: 49.94% (1.909 GiB/3.823 GiB)
Memory pool usage: 1.603 GiB (1.625 GiB reserved)  2.422488 seconds (1.74 k CPU allocations: 88.312 KiB) (40 GPU allocations: 763.702 MiB, 0.08% memmgmt time)
casper2002casper commented 2 years ago

I see, my bad. I wrongly assumed the equivalent of the GNN to be x = rand(N,1) while it should be x=rand(1,N)