JuliaInterop / JavaCall.jl

Call Java from Julia
http://juliainterop.github.io/JavaCall.jl
Other
118 stars 53 forks source link

Crashing after successive Java calls (Java not freeing memory?) #122

Closed el-oso closed 3 years ago

el-oso commented 4 years ago

When running successive calls of the same function I saw that julia crashes. In this case 50 times.

By increasing the heap size "-Xmx1024M" to " -Xmx4096M", the problem goes away, but then it comes back if I doubled the number of runs from 50 to 100.

julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 7 1700 Eight-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, znver1)
Environment:
  JULIA_COPY_STACKS = 1
  JULIA_NUM_THREADS = 16
(@v1.4) pkg> st JavaCall
Status `~/.julia/environments/v1.4/Project.toml`
  [494afd89] JavaCall v0.7.4 [`~/.julia/dev/JavaCall`]

Java code:

import java.util.Arrays;

public class JavaSum {
static final int ITERATIONS = 50;
 public static double jsum(double[] array) {
    double sum = 0;
    for (double value : array) {
        sum += value;
    }
    return sum;
}

public static double jsum2(double[] array) {
    return Arrays.stream(array).sum();
}

public static void main(String[] args) {
    double[] myDoubleArray = new double[10_000_000];
    double total;
        for (int i = 0; i < myDoubleArray.length; i++) {
            myDoubleArray[i] = Math.random() * 100;
        }

long start = System.currentTimeMillis(), last = start;
    for(int iter=0; iter < ITERATIONS; iter++) {
        long now = System.currentTimeMillis();
        jsum(myDoubleArray);
        //System.out.printf("(%d ms)%n",  (double)(now - last));
        last = now;
    }
    total = System.currentTimeMillis() - start;
    System.out.printf("total: %f (%f ms)%n", total, total / (double) ITERATIONS);
    }
}

Julia code:

using JavaCall
using StatsBase: describe

JavaCall.init(["-Xmx4024M", "-Djava.class.path=$(@__DIR__)"])
const j_sum = @jimport JavaSum

function java_sum(arr)
    jcall(j_sum, "jsum", jdouble, (Array{jdouble,1},), arr)
end

function framed_java_sum(arr)
    JavaCall.JNI.PushLocalFrame(Int32(length(arr)))
    result = java_sum(arr)
    JavaCall.JNI.PopLocalFrame(nothing)
end

function time_sum()
    a = rand(10^7)
    [@elapsed java_sum(a) for i in 1:100] .* 1000 |> describe
end

time_sum()

I tried to use PushLocalFrame() as suggested by @mkitti but didn't work out. See attached files.

java_sum_test.tar.gz

mkitti commented 4 years ago

Confirmed. I get a segfault from SetDoubleArrayRegion

signal (11): Segmentation fault
in expression starting at REPL[6]:1
jni_SetDoubleArrayRegion at ~/anaconda3/jre/lib/amd64/server/libjvm.so (unknown line)
SetDoubleArrayRegion at ~/.julia/dev/JavaCall/src/JNI.jl:630
SetDoubleArrayRegion at ~/.julia/dev/JavaCall/src/JNI.jl:630 [inlined]
convert_arg at ~/.julia/dev/JavaCall/src/convert.jl:75
unknown function (ip: 0x7f1cdb583f03)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
convert_args at ~/.julia/dev/JavaCall/src/convert.jl:41
_jcall at ~/.julia/dev/JavaCall/src/core.jl:197
unknown function (ip: 0x7f1cdb583dd2)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jcall at ~/.julia/dev/JavaCall/src/core.jl:124
unknown function (ip: 0x7f1cdb583642)
java_sum at ~/src/javacall_issue_122/javacall.jl:8
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1689 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:117
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:206
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:157 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:566
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:660
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:840
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:790
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:883
eval at ./boot.jl:331
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:134
repl_backend_loop at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:195
start_repl_backend at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:180
#run_repl#37 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:292
run_repl at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
#802 at ./client.jl:399
jfptr_YY.802_43008.clone_1 at ~/src/julia-1.5.0-beta1-linux-x86_64/julia-1.5.0-beta1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1689 [inlined]
do_apply at /buildworker/worker/package_linux64/build/src/builtins.c:655
jl_f__apply_latest at /buildworker/worker/package_linux64/build/src/builtins.c:705
#invokelatest#1 at ./essentials.jl:710 [inlined]
invokelatest at ./essentials.jl:709 [inlined]
run_main_repl at ./client.jl:383
exec_options at ./client.jl:313
_start at ./client.jl:506
jfptr__start_35292.clone_1 at ~/src/julia-1.5.0-beta1-linux-x86_64/julia-1.5.0-beta1/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2214 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/ui/../src/julia.h:1689 [inlined]
true_main at /buildworker/worker/package_linux64/build/ui/repl.c:106
main at /buildworker/worker/package_linux64/build/ui/repl.c:227
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at ../julia-1.5.0-beta1-linux-x86_64/julia-1.5.0-beta1/bin/julia (unknown line)
Allocations: 5100881 (Pool: 5099473; Big: 1408); GC: 7
Segmentation fault (core dumped)
el-oso commented 4 years ago

Adding my stack trace

$ julia javacall.jl 
[ Info: JavaCall could not determine javapath from `which java`

signal (11): Segmentation fault
in expression starting at /home/el_oso/tmp/example/javacall.jl:24
unknown function (ip: 0x7f7458b91eb1)
unknown function (ip: 0x7ffd56904fef)
SetDoubleArrayRegion at /home/el_oso/.julia/dev/JavaCall/src/JNI.jl:630
SetDoubleArrayRegion at /home/el_oso/.julia/dev/JavaCall/src/JNI.jl:630 [inlined]
convert_arg at /home/el_oso/.julia/dev/JavaCall/src/convert.jl:75
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
convert_args at /home/el_oso/.julia/dev/JavaCall/src/convert.jl:41
_jcall at /home/el_oso/.julia/dev/JavaCall/src/core.jl:197
unknown function (ip: 0x7f74c02e5cdf)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jcall at /home/el_oso/.julia/dev/JavaCall/src/core.jl:124
java_sum at /home/el_oso/tmp/example/javacall.jl:8 [inlined]
#3 at ./util.jl:234 [inlined]
iterate at ./generator.jl:47 [inlined]
collect_to! at ./array.jl:711
collect_to_with_first! at ./array.jl:689 [inlined]
collect at ./array.jl:670
time_sum at /home/el_oso/tmp/example/javacall.jl:21
unknown function (ip: 0x7f74c02e103c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2159 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1700 [inlined]
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:369
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:458
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:409 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:817
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:911
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:819
jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:872
jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:877
include at ./Base.jl:377
exec_options at ./client.jl:288
_start at ./client.jl:484
jfptr__start_2202 at /home/el_oso/Downloads/julia-1.4.2/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2145 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2323
unknown function (ip: 0x401931)
unknown function (ip: 0x401533)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4015d4)
Allocations: 5815883 (Pool: 5814751; Big: 1132); GC: 6
Segmentation fault (core dumped)
mkitti commented 4 years ago

The memory leak was plugged by https://github.com/JuliaInterop/JavaCall.jl/commit/8782ac728ba6939edb4f52fa742ec0e77659af46 . This might be too aggressive as I suspect that these changes might delete the local references not created as part of call.

mkitti commented 4 years ago

I also added a new test based on this issue since this was pretty easy to reproduce. https://github.com/JuliaInterop/JavaCall.jl/commit/81029b7220803d743bf966a059f8981973cfad3c

mkitti commented 4 years ago

I confirmed with @el-oso that the issue is also resolved in his environment as well so this issue is closeable.

I will keep the issue open a bit longer as I mop up some of the side effects.