JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.89k stars 5.49k forks source link

Massive memory leak in Julia due to tiny memory leak in C #55794

Open Yixiao-Zhang opened 2 months ago

Yixiao-Zhang commented 2 months ago

It is copied from my post on Julia discourse.

The latest version (v4.9.2) of netCDF-C (a C library for outputting data in the network Common Data Form.) is known to have memory leak. The leaked memory is not more than several MBs (see this link). However, when I use NCDatasets.jl, which is a wrapper in Julia for netcdf-C, I find that the memory leak reaches several GBs in the demo below:

using Printf
using NCDatasets

function create_dummpy_netcdf()
    dset = NCDataset("output.nc", "c", format=:netcdf4)
    close(dset)
end

function output_dummy_netcdf()
    dset = NCDataset("output.nc", "a")
    zeros(Float64, 1024, 1024)
    close(dset)
end

function main()

    create_dummpy_netcdf()
    for i in 1:1000
        output_dummy_netcdf()

        @info Printf.@sprintf "Max. RSS:  %9.3f MiB\n" Sys.maxrss()/2^20
    end

end

main()

The total memory usage reaches 8.2 GiB in the end, which is roughly 1000 times the allocation of zeros(Float64, 1024, 1024). Removing zeros(Float64, 1024, 1024) in output_dummy_netcdf makes the total memory usage limited to 400 MiB. It seems to me that the memory leak in the C code is "amplified" by allocations in Julia.

This issue has been posted as Alexander-Barth/NCDatasets.jl#266 (with version info). @Alexander-Barth finds that this bug can be produced simply with ccall-ing functions in libnetcdf.so. I am bringing the discussion here because I think it is related to how Julia manages memory.

I have also tried using Valgrind to profile the heap memory usage. However, when running with Valgrind, this bug cannot be reproduced. If you know a better tool for profiling memory in Julia, please let me know.

d-netto commented 2 months ago

Copying the versioninfo from https://github.com/Alexander-Barth/NCDatasets.jl/issues/266:

Julia Version 1.10.4
Commit 48d4fd48430 (2024-06-04 10:41 UTC)
Build Info: Official https://julialang.org/ release
Platform Info: OS: Linux (x86_64-linux-gnu)
CPU: 16 × 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, rocketlake)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)

Tested both on an M2 and on a Linux x86_64 machine. Could reproduce it on Linux, but the increase in RSS seems to be much less severe on Mac.

function create_dummpy_netcdf() dset = NCDataset("output.nc", "c", format=:netcdf4) close(dset) end

function output_dummy_netcdf() dset = NCDataset("output.nc", "a") zeros(Float64, 1024, 1024) close(dset) end

function main()

create_dummpy_netcdf()
for i in 1:1_000
    output_dummy_netcdf()
    @info Printf.@sprintf "Live bytes:  %9.3f MiB\n" Base.gc_live_bytes()/2^20
    @info Printf.@sprintf "Max. RSS:  %9.3f MiB\n" Sys.maxrss()/2^20
end

end

main()


- Mac:

Julia Version 1.10.4 Commit 48d4fd4843 (2024-06-04 10:41 UTC) Build Info:

Note: This is an unofficial build, please report bugs to the project
responsible for this build and not to the Julia project unless you can
reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info: OS: macOS (arm64-apple-darwin23.4.0) CPU: 12 × Apple M2 Max WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1) Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

... [ Info: Live bytes: 29.729 MiB [ Info: Max. RSS: 631.516 MiB [ Info: Live bytes: 37.745 MiB [ Info: Max. RSS: 631.516 MiB [ Info: Live bytes: 45.760 MiB [ Info: Max. RSS: 631.516 MiB [ Info: Live bytes: 53.776 MiB [ Info: Max. RSS: 631.516 MiB [ Info: Live bytes: 13.699 MiB [ Info: Max. RSS: 631.516 MiB [ Info: Live bytes: 21.714 MiB [ Info: Max. RSS: 631.531 MiB [ Info: Live bytes: 29.730 MiB [ Info: Max. RSS: 631.531 MiB [ Info: Live bytes: 37.745 MiB [ Info: Max. RSS: 631.531 MiB


- Linux x86_64:

Julia Version 1.10.4 Commit 48d4fd4843 (2024-06-04 10:41 UTC) Build Info:

Note: This is an unofficial build, please report bugs to the project
responsible for this build and not to the Julia project unless you can
reproduce the issue using official builds available at https://julialang.org/downloads

Platform Info: OS: Linux (x86_64-linux-gnu) CPU: 36 × Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512) Threads: 1 default, 0 interactive, 1 GC (on 36 virtual cores)

... [ Info: Live bytes: 21.750 MiB [ Info: Max. RSS: 8097.000 MiB [ Info: Live bytes: 29.763 MiB [ Info: Max. RSS: 8104.992 MiB [ Info: Live bytes: 37.776 MiB [ Info: Max. RSS: 8112.984 MiB [ Info: Live bytes: 45.789 MiB [ Info: Max. RSS: 8120.977 MiB [ Info: Live bytes: 53.802 MiB [ Info: Max. RSS: 8128.969 MiB [ Info: Live bytes: 13.737 MiB [ Info: Max. RSS: 8136.961 MiB [ Info: Live bytes: 21.750 MiB [ Info: Max. RSS: 8144.953 MiB [ Info: Live bytes: 29.763 MiB [ Info: Max. RSS: 8152.945 MiB [ Info: Live bytes: 37.776 MiB [ Info: Max. RSS: 8160.938 MiB

d-netto commented 2 months ago

Wondering if this could be related to something specific to glibc itself?

(E.g. maybe glibc algorithm fragments more than Apple's, or keeps more pages around and is lazier than Apple's when returning them to the OS, etc.).

giordano commented 2 months ago

You mean something like #42566?

d-netto commented 2 months ago

Just to confirm this leak is not coming from the pool allocator...

Added this patch to v1.10.4:

diff --git a/src/gc-pages.c b/src/gc-pages.c
index 682e76611f..be2ec0462a 100644
--- a/src/gc-pages.c
+++ b/src/gc-pages.c
@@ -9,6 +9,11 @@
 extern "C" {
 #endif

+JL_DLLEXPORT uint64_t jl_get_pg_size(void)
+{
+    return GC_PAGE_SZ;
+}
+
 // Try to allocate memory in chunks to permit faster allocation
 // and improve memory locality of the pools
 #ifdef _P64
@@ -19,6 +24,12 @@ extern "C" {
 #define MIN_BLOCK_PG_ALLOC (1) // 16 KB

 static int block_pg_cnt = DEFAULT_BLOCK_PG_ALLOC;
+static _Atomic(size_t) current_pg_count = 0;
+
+JL_DLLEXPORT uint64_t jl_current_pg_count(void)
+{
+    return (uint64_t)jl_atomic_load(&current_pg_count);
+}

 void jl_gc_init_page(void)
 {
@@ -148,6 +159,7 @@ exit:
     SetLastError(last_error);
 #endif
     errno = last_errno;
+    jl_atomic_fetch_add(&current_pg_count, 1);
     return meta;
 }

@@ -188,6 +200,7 @@ void jl_gc_free_page(jl_gc_pagemeta_t *pg) JL_NOTSAFEPOINT
     madvise(p, decommit_size, MADV_DONTNEED);
 #endif
     msan_unpoison(p, decommit_size);
+    jl_atomic_fetch_add(&current_pg_count, -1);
 }

 #ifdef __cplusplus

and then ran this MWE:

using Printf
using NCDatasets

function create_dummpy_netcdf()
    dset = NCDataset("output.nc", "c", format=:netcdf4)
    close(dset)
end

function output_dummy_netcdf()
    dset = NCDataset("output.nc", "a")
    zeros(Float64, 1024, 1024)
    close(dset)
end

function main()

    create_dummpy_netcdf()
    for i in 1:1_000
        output_dummy_netcdf()
        @info Printf.@sprintf "Live bytes:  %9.3f MiB\n" Base.gc_live_bytes()/2^20
        @info Printf.@sprintf "Max. RSS:  %9.3f MiB\n" Sys.maxrss()/2^20
        @info Printf.@sprintf "Current page count: %d\n" @ccall jl_current_pg_count()::Cint
    end
end

main()

Current page count was fairly stable:

...
[ Info: Live bytes:     53.916 MiB
[ Info: Max. RSS:   8102.328 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     13.822 MiB
[ Info: Max. RSS:   8110.320 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     21.841 MiB
[ Info: Max. RSS:   8118.312 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     29.860 MiB
[ Info: Max. RSS:   8126.305 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     37.879 MiB
[ Info: Max. RSS:   8134.297 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     45.898 MiB
[ Info: Max. RSS:   8142.289 MiB
[ Info: Current page count: 1980
[ Info: Live bytes:     53.916 MiB
[ Info: Max. RSS:   8150.281 MiB
[ Info: Current page count: 1980
d-netto commented 2 months ago

You mean something like https://github.com/JuliaLang/julia/issues/42566?

Possibly.

vchuravy commented 2 months ago

I get:

[ Info: Max. RSS:    478.117 MiB
[ Info: Max. RSS:    478.117 MiB
  0.932792 seconds (753.56 k allocations: 7.865 GiB, 18.04% gc time, 22.93% compilation time)
julia> versioninfo()
Julia Version 1.10.5
Commit 6f3fdf7b362 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 16 × AMD Ryzen 7 7840U w/ Radeon  780M Graphics
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 16 virtual cores)