cnuernber / libjulia-clj

Julia bindings for Clojure -- Currently somewhat unstable --
MIT License
112 stars 1 forks source link

A comfortable way to use createArray/arrayToJvm? #4

Closed subes closed 2 years ago

subes commented 2 years ago

I tried this:

final Object created = libjulia_clj.java_api.createArray("int8", new int[] { 1, 1 }, new byte[] { 1 });
final Map value = libjulia_clj.java_api.arrayToJVM(created);
  1. when using createArray, how to put that into a variable in julia and reference that variable in the script?

    • libjulia_clj.java_api.runString("println(varinfo())"); outputs:
      | name        |      size | summary                                      |
      |:----------- | ---------:|:-------------------------------------------- |
      | Base        |           | Module                                       |
      | Core        |           | Module                                       |
      | Main        |           | Module                                       |
      | isinstalled |   0 bytes | isinstalled (generic function with 1 method) |
      | jvm_refs    | 361 bytes | IdDict{Any, Any} with 2 entries              |
    • libjulia_clj.java_api.runString("println(jvm_refs)"); outputs:
      IdDict{Any, Any}(nothing => nothing, Int8[1;;] => Int8[1;;], Int8[1;;] => Int8[1;;], UndefInitializer() => UndefInitializer())
    • so do I have to always take the second-last element of the dict to get the last created array? Though having to use runString("variableName = <second last element of dict>") as a second call is a bit overhead. Also the dict does not have a distinguishable key/id for the new array. So maybe add another parameter for the name of the variable or dict key?
    • the JuliaArray that is returned from createArray also has no id/key/variablename to identify it on the other side.
  2. I guess I have to use runString("variableName") to get the JuliaArray pointer of a variable, then use arrayToJvm on that JuliaArray pointer to retrieve the data? A single operation could make this faster maybe?

So maybe something like putGlobal/getGlobal as in libpython-clj could make this easier and faster?

In the same way I did not understand the createArray/arrayToJvm functions of libpython-clj. ^^

cnuernber commented 2 years ago

The arrayToJVM pathway takes a julia array and returns a map of three things, datatype, shape, and data. So that copies the data entirely back into the JVM leaving no reference to the julia data.

I didn't yet implement the copy pathway in terms of updating a julia array from the JVM or vice versa nor have I directly exposed the zerocopy pathway which would return an NDBuffer interface but I think I would like to avoid exposing that interface for now and just have ways to create arrays and copy data into/from them into JVM primitive arrays.

Julia does have a get/set global pathway that it exposes but I haven't used it much as I just use functions. I will expose it and let's see what happens.

cnuernber commented 2 years ago

On second glance the jl_get_global pathway is used to get global variables exported from modules directly. I am not sure the setGlobal, setGlobal, runScript pathway works for Julia. Perhaps only the runScript, callFn pathway is valid here.

cnuernber commented 2 years ago

See discussions on discourse such as this one

cnuernber commented 2 years ago

The return value of createArray in both systems is the canonical dense array type in that system. So in python it is a numpy array that you can access in Java and in Julia the result is a DenseArray that you can access in Java. You can use this as an argument to a function in a function call directly. Or in the python API you can copy new data into it or out of it.

I am a bit unclear as to where the confusion is :-).

subes commented 2 years ago

With Python afterwards I understood that one can use it by putting the array as a global. Thus it turns into a variable. With Julia I now understand that I should use this:

IFn putGlobalFunction = (IFn) libjulia_clj.java_api.runString("function libjuliaclj_putGlobal(variable, value); global __ans__ = value; eval(Meta.parse(\"global \"*variable*\" = __ans__\")); return nothing; end");
final Object array = libjulia_clj.java_api.createArray("int32", new int[] { 1, 1 }, new int[] { 100 });
putGlobalFunction.invoke("asdf", array);
final Object arrayGet = libjulia_clj.java_api.runString("asdf");
final Map jvmArray = libjulia_clj.java_api.arrayToJVM(arrayGet);
System.out.println(jvmArray.get("datatype"));
System.out.println(Arrays.toString((int[]) jvmArray.get("shape")));
System.out.println(Arrays.toString((int[]) jvmArray.get("data")));

Giving this output:

int32
[1, 1]
[100]

Thus this works well and is fine for my use case for the data types that are listed in the docs.

Though it seems Bool (with boolean in java), Char (with char in Java) and String (with String in Java) are not supported as data types. I guess this will be the case both for createArray, as well as arrayToJvm. I get exceptions when I try these data types.

cnuernber commented 2 years ago

That is clever. I don't think it will be very quick as you are compiling code at runtime but it will work.

For createArray and boolean I am not sure Julia's binary representation, char is an unsigned short in terms of binary representation which is supported and strings are objects.

This is the same with Python. You can marshal strings across in fn calls but a binary representation would be a byte array of UTF-8 encoded data or something like that or for multiple strings a byte array and an offset array.

subes commented 2 years ago

This sadly gives an exception:

final IFn putGlobalFunction = (IFn) libjulia_clj.java_api.runString(
                "function libjuliaclj_putGlobal(variable, value); global __ans__ = value; eval(Meta.parse(\"global \"*variable*\" = __ans__\")); return nothing; end");
        putGlobalFunction.invoke("asdf", new int[] { 1, 2, 3 });
2022-01-07 14:54:28.263 [ |7-1:InitializingJul] ERROR de.invesdwin.ERROR.process                                   - processing #00000001
de.invesdwin.context.log.error.LoggedRuntimeException: #00000001 java.lang.Exception: Item [I@438c4fbb is not convertible to julia
        ... 13 omitted, see following cause or error.log
Caused by - java.lang.Exception: Item [I@438c4fbb is not convertible to julia
        at libjulia_clj.impl.base$eval13013$fn__13014.invoke(base.clj:824)
        at libjulia_clj.impl.protocols$eval12377$fn__12378$G__12368__12383.invoke(protocols.clj:9)
        at libjulia_clj.impl.base$jvm_args__GT_julia$fn__12698.invoke(base.clj:422)
        at clojure.core$mapv$fn__8468.invoke(core.clj:6914)
        at clojure.lang.PersistentVector.reduce(PersistentVector.java:343)
        at clojure.core$reduce.invokeStatic(core.clj:6829)
        at clojure.core$mapv.invokeStatic(core.clj:6905)
        at clojure.core$mapv.invoke(core.clj:6905)
        at libjulia_clj.impl.base$jvm_args__GT_julia.invokeStatic(base.clj:421)
        at libjulia_clj.impl.base$jvm_args__GT_julia.invoke(base.clj:419)
        at libjulia_clj.impl.base$raw_call_function$fn__12711$fn__12712.invoke(base.clj:457)
        at libjulia_clj.impl.base$raw_call_function$fn__12711.invoke(base.clj:457)
        at clojure.lang.AFn.applyToHelper(AFn.java:152)
        at clojure.lang.AFn.applyTo(AFn.java:144)
        at clojure.core$apply.invokeStatic(core.clj:667)
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1977)
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1977)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at libjulia_clj.impl.base$raw_call_function.invokeStatic(base.clj:455)
        at libjulia_clj.impl.base$raw_call_function.invoke(base.clj:450)
        at libjulia_clj.impl.base$call_function.invokeStatic(base.clj:482)
        at libjulia_clj.impl.base$call_function.invoke(base.clj:477)
        at libjulia_clj.impl.base.CallableJuliaObject.invoke(base.clj:296)
      * at de.invesdwin.context.julia.runtime.libjuliaclj.internal.UncheckedJuliaEngineWrapper.init(UncheckedJuliaEngineWrapper.java:64) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
      * at de.invesdwin.util.concurrent.internal.WrappedThreadFactory.lambda$newThread$0(WrappedThreadFactory.java:44) *
        ... 2 more, see error.log

So when this does not work for int[], I don't think it will work for String[], char[] or boolean[]. Matrix equivalents will also not work.

cnuernber commented 2 years ago

For arrays of primitives you can use createArray like in Python. I never implemented the generic marshal-as-list pathway for Julia.

cnuernber commented 2 years ago

Nor did I implement bridging of java<->julia like in Python.

subes commented 2 years ago

I did a small benchmark for array (with the putGlobal workaround) vs json:

final int count = 1000;
final int[] array = new int[count];
for (int i = 0; i < count; i++) {
    array[i] = i;
}
final int iterations = 100;
final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
for (int t = 0; t < 10; t++) {
    Instant start = new Instant();
    for (int i = 0; i < iterations; i++) {
        engine.getInputs().putIntegerVector("asdf", array);
        final int[] out = engine.getResults().getIntegerVector("asdf");
        Assertions.checkEquals(array, out);
    }
    System.out.println("json: " + start);

    start = new Instant();
    for (int i = 0; i < iterations; i++) {
        putIntegerVector("asdf", array);
        final int[] out = getIntegerVector("asdf");
        Assertions.checkEquals(array, out);
    }
    System.out.println("array: " + start);
}

Array Size 10:

json: PT0.129.296.754S
array: PT0.423.123.667S

Array Size 100:

json: PT0.187.362.565S
array: PT0.422.467.251S

Array Size 500:

json: PT0.349.488.063S
array: PT0.319.867.924S

Array Size 1000:

json: PT0.623.612.129S
array: PT0.267.729.846S

Array Size 10000:

json: PT5.354.057.666S
array: PT0.317.376.845S

So it seems the sweet spot is about 500 elements at which the createArray/arrayToJVM starts to outperform (with an exponential curve for json getting slower).

cnuernber commented 2 years ago

That is fascinating, thanks for doing this. Perhaps my createArray pathway could use some work -- I would expect that break even to be close to 100 or even in the 10s.

cnuernber commented 2 years ago

Could you try the same with doubles? Parsing doubles is quite CPU intensive.

subes commented 2 years ago

And here a cross-check for json/array on put/get mixed:

final int count = 10;
final int[] array = new int[count];
for (int i = 0; i < count; i++) {
    array[i] = i;
}
final int iterations = 100;
final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
for (int t = 0; t < 10; t++) {
    Instant start = new Instant();
    for (int i = 0; i < iterations; i++) {
        engine.getInputs().putIntegerVector("asdf", array);
        final int[] out = getIntegerVector("asdf");
        Assertions.checkEquals(array, out);
    }
    System.out.println("json->array: " + start);

    start = new Instant();
    for (int i = 0; i < iterations; i++) {
        putIntegerVector("asdf", array);
        final int[] out = engine.getResults().getIntegerVector("asdf");
        Assertions.checkEquals(array, out);
    }
    System.out.println("array->json: " + start);
}

Array Size 10:

json->json:   PT0.129.296.754S
array->array: PT0.423.123.667S
json->array:  PT0.122.557.419S
array->json:  PT0.476.679.829S

Array Size 100:

json->json:   PT0.187.362.565S
array->array: PT0.422.467.251S
array->json:  PT0.316.997.736S
json->array:  PT0.086.150.963S

Array Size 500:

json->json:   PT0.349.488.063S
array->array: PT0.319.867.924S
json->array:  PT0.299.669.802S
array->json:  PT0.305.925.949S

Array Size 1000:

json->json:   PT0.623.612.129S
array->array: PT0.267.729.846S
json->array:  PT0.554.050.764S
array->json:  PT0.330.586.447S

Array Size 10000:

json->json:   PT5.354.057.666S
array->array: PT0.317.376.845S
json->array:  PT5.446.018.357S
array->json:  jvm crash

signal (11): Speicherzugriffsfehler
in expression starting at none:0
jl_lookup_generic_ at /buildworker/worker/package_linux64/build/src/gf.c:2357 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2425
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
jl_call3 at /buildworker/worker/package_linux64/build/src/jlapi.c:282
unknown function (ip: 0x7f69e01f7051)
unknown function (ip: 0x7f69e01f5f4b)
unknown function (ip: 0x7f69e01ef361)
unknown function (ip: 0x7f69e01f5b6a)
unknown function (ip: 0x7f69e01f71e7)
unknown function (ip: 0x7f6b104acc7a)
unknown function (ip: 0x7f6b104a83aa)
unknown function (ip: 0x7f6b104a83aa)
unknown function (ip: 0x7f6b104a8822)
unknown function (ip: 0x7f6b188384b3)
Allocations: 2981730 (Pool: 2981115; Big: 615); GC: 2
cnuernber commented 2 years ago

OK that last one (the crash) is due to the jsig library not being pre-loaded I believe.

Also, are you parsing the json on the Julia side? What is the julia code on the julia side?

subes commented 2 years ago

Doubles:

for (final int count : new int[] { 10, 100, 250, 500, 1000, 10000 }) {
    System.out.println("\nArray Size: " + count);
    final double[] array = new double[count];
    for (int i = 0; i < count; i++) {
        array[i] = i;
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            final Object juliaArray = libjulia_clj.java_api.createArray("float64",
                    new int[] { 1, array.length }, array);
            final double[] out = (double[]) libjulia_clj.java_api.arrayToJVM(juliaArray).get("data");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("arraynoVar: " + start);
        }
    }
}
Array Size: 10
json->json: PT0.052.134.610S
array->array: PT0.279.201.113S
json->array: PT0.047.210.561S
array->json: PT0.305.072.958S
arraynoVar: PT0.243.286.708S

Array Size: 100
json->json: PT0.119.502.708S
array->array: PT0.255.182.870S
json->array: PT0.101.172.022S
array->json: PT0.269.631.100S
arraynoVar: PT0.228.577.018S

Array Size: 250
json->json: PT0.237.908.112S
array->array: PT0.253.072.265S
json->array: PT0.216.793.774S
array->json: PT0.345.277.280S
arraynoVar: PT0.248.021.090S

Array Size: 500
json->json: PT0.412.828.255S
array->array: PT0.255.073.203S
json->array: PT0.416.885.463S
array->json: PT0.324.010.566S
arraynoVar: PT0.234.103.157S

Array Size: 1000
json->json: PT0.851.621.147S
array->array: PT0.247.321.124S
json->array: PT0.752.241.435S
array->json: PT0.315.612.106S
arraynoVar: PT0.211.591.847S

Array Size: 10000
json->json: PT7.950.034.664S
array->array: PT0.263.089.033S
json->array: PT7.273.363.950S
array->json: PT0.689.702.553S
arraynoVar: PT0.255.734.125S

BreakEven for doubles seems to be at about 250 values.

subes commented 2 years ago

for Put I am actually not using JSON, but using runString with variable = [1,2,3,4,5,6, ...] get uses runString with JSON.parse(variable)

The crash happens with JSIG preloading: grafik

cnuernber commented 2 years ago

So for put you are literally evalling a string with the numbers serialized with commas. That makes sense although if you are using whole numbers your double pathway isn't incurring full double parsing cost.

Crazy that serializing to string and having Julia parse that would be faster than just setting the variable value directly as an array. Something definitely seems off there.

cnuernber commented 2 years ago

Did you read that discussion about using global variables btw? Specifically the parts where they talk about how they are fundamentally slower than function arguments due to how the compiler works?

subes commented 2 years ago

Here with actual fractional numbers (3-4 digits):

for (final int count : new int[] { 10, 100, 250, 500, 1000, 10000 }) {
    System.out.println("\nArray Size: " + count);
    final double[] array = new double[count];
    for (int i = 0; i < count; i++) {
        array[i] = i / 100D;
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            final Object juliaArray = libjulia_clj.java_api.createArray("float64",
                    new int[] { 1, array.length }, array);
            final double[] out = (double[]) libjulia_clj.java_api.arrayToJVM(juliaArray).get("data");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("arraynoVar: " + start);
        }
    }
}
Array Size: 10
json->json: PT0.051.733.793S
array->array: PT0.279.778.665S
json->array: PT0.051.902.462S
array->json: PT0.320.492.887S
arraynoVar: PT0.329.435.086S

Array Size: 100
json->json: PT0.139.867.391S
array->array: PT0.311.617.513S
json->array: PT0.136.425.372S
array->json: PT0.323.135.424S
arraynoVar: PT0.249.825.208S

Array Size: 250
json->json: PT0.252.158.663S
array->array: PT0.278.248.870S
json->array: PT0.261.073.317S
array->json: PT0.304.389.499S
arraynoVar: PT0.234.771.811S

Array Size: 500
json->json: PT0.442.568.924S
array->array: PT0.257.088.362S
json->array: PT0.405.058.343S
array->json: PT0.306.614.816S
arraynoVar: PT0.223.109.136S

Array Size: 1000
json->json: PT0.840.215.928S
array->array: PT0.240.794.687S
json->array: PT0.800.465.552S
array->json: PT0.327.652.251S
arraynoVar: PT0.206.024.322S

Array Size: 10000
json->json: PT7.736.429.482S
array->array: PT0.264.002.677S
json->array: PT7.173.188.868S
array->json: PT0.798.268.404S
arraynoVar: PT0.291.414.415S

I don't see a significant difference. Also I added a arraynoVar measure where my putGlobal workaround is removed. Thus only createArray/arrayToJvm is used. It shows that the putGlobal workaround only adds a little delay in comparison to array->array (this still in the 10s of milliseconds magnitude at these counts).

cnuernber commented 2 years ago

You aren't calculating anything. What about calculating sum or cumsum?

subes commented 2 years ago

Yes I read that. My consideration is that one should wrap ones code in modules. But that is a decision of the script writer. Worst case he should still be able to work with globals or work around module boundaries by using globals if he wishes.

subes commented 2 years ago

Calculating on the julia side will definitely add some constant to the test, but then it is not a pure test for communication overhead anymore. So the differences in the communication styles won't be visible as much. But I can add that for you.

cnuernber commented 2 years ago

Setting globals I would argue also isn't a test for communication - calling a function is the test for that. Globals is a work-around for not being able to call functions with arguments.

cnuernber commented 2 years ago

I am also seeing that creating an array takes about 3ms on my computer regardless of array size. Trying to track that down.

subes commented 2 years ago

Dunno how to execute cumsum or cumsum! on a vector, it throws exceptions because of missing dims. So I used an element wise multiplication instead.

for (final int count : new int[] { 10, 100, 250, 500, 1000, 10000 }) {
    System.out.println("\nArray Size: " + count);
    final double[] array = new double[count];
    for (int i = 0; i < count; i++) {
        array[i] = i / 10000D;
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            libjulia_clj.java_api.runString("asdf = asdf.*2");
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkNotEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            libjulia_clj.java_api.runString("asdf = asdf.*2");
            final double[] out = getDoubleVector("asdf");
            Assertions.checkNotEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            libjulia_clj.java_api.runString("asdf = asdf.*2");
            final double[] out = getDoubleVector("asdf");
            Assertions.checkNotEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            libjulia_clj.java_api.runString("asdf = asdf.*2");
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkNotEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->json: " + start);
        }
        //                start = new Instant();
        //                for (int i = 0; i < iterations; i++) {
        //                    final Object juliaArray = libjulia_clj.java_api.createArray("float64",
        //                            new int[] { 1, array.length }, array);
        //                    libjulia_clj.java_api.runString("cumsum(asdf)");
        //                    final double[] out = (double[]) libjulia_clj.java_api.arrayToJVM(juliaArray).get("data");
        //                    Assertions.checkNotEquals(array, out);
        //                }
        //                if (t == 1) {
        //                    System.out.println("arraynoVar: " + start);
        //                }
    }
}
Array Size: 10
json->json: PT0.075.710.789S
array->array: PT0.341.968.631S
json->array: PT0.077.968.358S
array->json: PT0.345.370.005S

Array Size: 100
json->json: PT0.161.367.949S
array->array: PT0.330.260.405S
json->array: PT0.135.844.308S
array->json: PT0.299.573.076S

Array Size: 250
json->json: PT0.285.161.235S
array->array: PT0.289.693.289S
json->array: PT0.236.890.527S
array->json: PT0.315.549.184S

Array Size: 500
json->json: PT0.452.659.795S
array->array: PT0.246.381.848S
json->array: PT0.420.421.282S
array->json: PT0.302.938.989S

Array Size: 1000
json->json: PT0.868.357.597S
array->array: PT0.282.096.276S
json->array: PT0.914.847.406S
array->json: PT0.455.046.445S

Array Size: 10000
json->json: PT7.619.732.673S
array->array: PT0.319.163.334S
json->array: PT7.365.860.064S
array->json: PT0.769.500.026S

eval(Meta.parse(...)) is arguably the same workaround for not having a dynamic language or reflection capabilities where you can put globals in a different way. But those constructs are there in the language because it is hard to work without them. In a pure world every application would be written in a functional language and there would be no need for state or objects. ^^

Julia is the first language that I encounter in which globals are significantly slower than other scoped variables.

But regardless, the arraynoVar test does not use a global variable. Even though the libjulia-clj uses the implicit jvm_refs global dict.

cnuernber commented 2 years ago

I found the speed issue :-). Array creation is 10X what it was.

Julia is much closer to python+numba where everything is compiled with numba and if you have incorrect types then each access of the type is predicated by a typecheck. That is a much slower compilation pathway.

In c++ there are many times when using any pointer or ref variable is much slower as the compiler has to protect access to the data as opposed to loading it into a register plus you can incur things like read-after-write stalls.

1.000-beta-7 has far faster array creation.

cnuernber commented 2 years ago

Ah the jvm_refs is a global table of the julia objects exposed to java. It isn't the way I access data and when you have a julia object you have a real pointer to the Julia object. I just have to tell the Julia GC not to get rid of the object while it is still reachable via Java.

subes commented 2 years ago

This is a lot better now:

for (final int count : new int[] { 10, 100, 250, 500, 1000, 10000 }) {
    System.out.println("\nArray Size: " + count);
    final double[] array = new double[count];
    for (int i = 0; i < count; i++) {
        array[i] = i / 10000D;
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleVector("asdf", array);
            final double[] out = getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("json->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleVector("asdf", array);
            final double[] out = engine.getResults().getDoubleVector("asdf");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("array->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            final Object juliaArray = libjulia_clj.java_api.createArray("float64",
                    new int[] { 1, array.length }, array);
            final double[] out = (double[]) libjulia_clj.java_api.arrayToJVM(juliaArray).get("data");
            Assertions.checkEquals(array, out);
        }
        if (t == 1) {
            System.out.println("arraynoVar: " + start);
        }
    }
}
Array Size: 10
json->json: PT0.055.050.123S
array->array: PT0.069.636.372S
json->array: PT0.047.911.751S
array->json: PT0.069.720.331S
arraynoVar: PT0.030.613.722S

Array Size: 100
json->json: PT0.122.092.966S
array->array: PT0.062.040.869S
json->array: PT0.120.447.239S
array->json: PT0.073.285.413S
arraynoVar: PT0.033.531.088S

Array Size: 250
json->json: PT0.259.358.144S
array->array: PT0.047.816.375S
json->array: PT0.225.087.215S
array->json: PT0.084.670S
arraynoVar: PT0.022.407.962S

Array Size: 500
json->json: PT0.426.482.537S
array->array: PT0.049.094.889S
json->array: PT0.419.008.202S
array->json: PT0.089.325.303S
arraynoVar: PT0.016.404.977S

Array Size: 1000
json->json: PT0.796.744.561S
array->array: PT0.041.250.509S
json->array: PT0.768.937.038S
array->json: PT0.111.798.182S
arraynoVar: PT0.014.293.287S

Array Size: 10000
json->json: PT7.665.306.802S
array->array: PT0.063.944.265S
json->array: PT7.266.388.979S
array->json: PT0.575.664.593S
arraynoVar: PT0.032.270.531S

Now it should be ok to use the createArray/arrayToJvm path for any sized array. Thanks!

I will switch to the new path for the supported data types. Hope we can get boolean/char/string too at some point. Though I understand that those are less common types.

cnuernber commented 2 years ago

At least boolean and string.

char is IMO an abomination of bad java design that java devs have abused because that is the only way they get unsigned short arithmetic ;-).

Thanks again for these issues. It is super helpful to have this type of analysis and I do really enjoy these discussions.

subes commented 2 years ago

Same here. :)

subes commented 2 years ago

And here a benchmark for matrix put/get:

final int cols = 10;
for (final int rows : new int[] { 1, 10, 25, 50, 100, 1000 }) {
    System.out.println("\nArray Size: " + rows + "*" + cols + "=" + (rows * cols));
    final double[][] matrix = new double[rows][];
    int element = 0;
    for (int i = 0; i < rows; i++) {
        final double[] row = new double[cols];
        matrix[i] = row;
        for (int j = 0; j < cols; j++) {
            row[j] = element++;
        }
    }
    final int iterations = 100;
    final LibjuliacljScriptTaskEngineJulia engine = new LibjuliacljScriptTaskEngineJulia(this);
    for (int t = 0; t < 2; t++) {
        Instant start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleMatrix("asdf", matrix);
            final double[][] out = engine.getResults().getDoubleMatrix("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("json->json: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleMatrix("asdf", matrix);
            final double[][] out = getDoubleMatrix("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("array->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            engine.getInputs().putDoubleMatrix("asdf", matrix);
            final double[][] out = getDoubleMatrix("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("json->array: " + start);
        }
        start = new Instant();
        for (int i = 0; i < iterations; i++) {
            putDoubleMatrix("asdf", matrix);
            final double[][] out = engine.getResults().getDoubleMatrix("asdf");
            Assertions.checkEquals(matrix, out);
        }
        if (t == 1) {
            System.out.println("array->json: " + start);
        }
    }
}
Array Size: 1*10=10
json->json: PT0.051.907.695S
array->array: PT0.066.664.417S
json->array: PT0.053.828.212S
array->json: PT0.064.158.726S

Array Size: 10*10=100
json->json: PT0.190.794.079S
array->array: PT0.050.262.805S
json->array: PT0.182.740.254S
array->json: PT0.058.082.169S

Array Size: 25*10=250
json->json: PT0.431.007.579S
array->array: PT0.047.016.940S
json->array: PT0.416.722.445S
array->json: PT0.075.436.332S

Array Size: 50*10=500
json->json: PT0.815.633.470S
array->array: PT0.042.914.069S
json->array: PT0.797.780.094S
array->json: PT0.074.112.403S

Array Size: 100*10=1000
json->json: PT1.564.662.764S
array->array: PT0.041.754.057S
json->array: PT1.534.069.476S
array->json: PT0.090.441.305S

Array Size: 1000*10=10000
json->json: PT15.040.703.335S
array->array: PT0.062.597.749S
json->array: PT14.946.558.415S
array->json: PT0.463.554.137S
subes commented 2 years ago

The testcases got a bit slower with the switch to createArray/arrayToJvm.

Before: grafik

After: grafik

But this should be fine since larger vectors/matrices are significantly faster now.

Though I had to implement a fallback. When runString(...) does not return a JuliaArray, but instead something like a JuliaTuple, then I have to fall back to json parsing, since there is no API to get the other types. But that is not a problem since in most cases it should be the correct array type to be returned. Only e.g. size(someArray) for example returns a JuliaTuple instead of an Array.

subes commented 2 years ago

Thus we can close this issue. Thanks a lot!

subes commented 2 years ago

Another test today after a reboot: grafik

cnuernber commented 2 years ago

Still slower.

I want to argue strongly against any sort of array-of-array matrix definition. I don't know why Java people do this but it is a bad definition. It is slower, loses memory locality, and makes making any sort of in place transpose or view pathway impossible as your are locking the structure of the matrix which is something that can change into the data storage facility.

A matrix in literally every other language (C, numpy, Julia, C++, R, etc) is a pointer to data along with a shape and possibly strides. Using this you can do in-place transpose and subrect-selection creating views and such of your data. Transferring that matrix system between systems is often is a single memcpy call and dropping down to use BLAS, MKL or something like that easy and straightforward. None of this is true if your matrix definition is double[][]. Moving to higher dimensions makes the array-of-array (of-array-of-array) approach even make far less sense.

Your last matrix example, 1000*10 means that something had to do either 1000 arraycopy calls or 1000 memcpy calls in order to transfer that fragmented definition into a single physical buffer. You get the exact same result doing createArray with a single buffer of 10000 and a shape of [1000 10] but the magic is that is a single memcpy pathway and some different interpretation of the same buffer.

What I have seen in other Java systems is that they bake the definition of a matrix as an array of array way deep into their code and then it is impossible to change it later. Haifeng moved to a sane definition for his MKL bindings - but because he started with the double[][] pathway things like PCA are much slower than they are in other systems and there is a nontrivial transformation cost to use MKL from his canonical definition. I want to make at least an attempt to dissuade you from making the exact same mistake.

Sorry if this seems heavy handed, it is an private irritation of mine. I hope that if you read data out of your storage system you don't automatically read it into an array-of-array else you are locking yourself into a lower performance tier for many if not all operations - or perhaps worse exposing an array-of-array as a required interface definition that clients have to support and thus doing the same - locking the entire system via architectural decisions into a lower performance tier.

subes commented 2 years ago

Yes, these are valid thoughts. The storage system does not care about the representation of the contents. That is what is abstracted away by the ISerde (stands for Serializer/Deserializer) implementation. The database and the data pipelines only care about arrays of bytes internally.

Regarding language integration: Most cases I am dealing with right now use small datasets and double[][] is intuitive to use even for novice developers. Also it works fast enough for small datasets and when calls are not done in tight loops. But it definitely is not the most efficient. Regarding multidimensional arrays I wanted to implement a putMultidimensional/getMultidimensional (or maybe a better name would be putNDArray/getNDarray) pathway that applies this approach. But still with bigger sized datasets it will be better to write the data to disk and read it as a file in the scripting language. Worst case as CSV files, better as binary. If the datasets fits into memory (off-heap), Arrow seems to be a nice solution to transport without the persistence overhead of the file system. Right now this is possible with the language integrations that I am providing, though I make the easy path as easy as possible (by providing paths for double[] and double[][] which is easy to handle and fits with most language integration libraries).

What I am missing from existing ndarray implementations in Java is that you still have to transform it back to double[] or double[][] because we don't have a dominant framework like numpy in java land that enforces a more efficient storage through frameworks (think commons-math, smile, and almost any other framework). So you always have to wastefully copy data. In most cases just going with double[][] directly is then faster instead of transforming it from an ndarray representation. Some frameworks have ndarray implementations, but their underlying storage differs, so you again have to transform datasets.

The world looks different if you implement the frameworks yourself. In that case I think tech.ml.dataset seems to look like a great basis for an ndarray in java. But it being only available for clojure code right now prevents me from using that as the representation of an ndarray pathway in my frameworks. Even though the arrow integration would make it ideal to use at the other end to allow zero-copy memory sharing between languages (or even processes). I know it was a lot of work to develop something like tech.ml.dataset. That is also the reason why I am not considering to implement something like this myself right now. Instead I would rather integrate something that already does that. I was glad something like tablesaw was created, but as you know yourself, it did not convince me to do the job good enough to become a standard.

It is similar to byte buffer implementations in java. We have agrona-DirectBuffer, chronicle-bytes, java-ByteBuffer, netty-ByteBuf and some more exotic ones like indeed-buffer. Some provide support for memory mapping of files, some only allow heap buffers, others allow off heap memory. Some support int, others long addresses. In the end I wrote my own abstraction above this by taking the agrona API as a basis and writing my interop-layer on top of it (IByteBuffer and IMemoryBuffer). This then is the basis for my ISerde abstraction that can work with any framework and allows to move data between frameworks without copy-overhead (or reducing it to a minimum). I guess in the end If/when I need it, I will most likely develop something like this as an abstraction for an INDArray abstraction. Though this also has drawbacks, because e.g. netty-ByteBuf is an abstract base class instead of an interface to make it ~5% faster. I chose an Interface because only that allows to write an abstraction over other frameworks properly. And even then, using delegate classes (using composition instead of inheritance) will add more overhead due to pointer dereferencing. So that solution will also not bring the maximum performance possible. Though were it is possible (e.g. with agrona) I instead inherit from the existing buffer class and implement my interface on top of it.

So dunno if I can use tech.ml.dataset at some point similar to agrona-DirectBuffer in order to create my INDArray abstraction.

Though specifically with libjulia-clj: The double[][] pathway with JuliaArray (createArray/arrayToJvm) seems to do a transformation of the array from row-major (in libjulia-clj) to column major (in julia). This effectively transposes (or something else because the data was just mangled in some ways) when I put in the double[][] directly into the createArray. I decided to instead allocate my own double[] and write the representation in there that treats double[][] as column major. So when one does Arrays.toString(matrix) in java, the same output is generated as one does when doing println(matrix) in julia. I guess having me create the double[] array myself instead of giving double[][] to libjulia-clj is even more inefficient, because julia will again copy the double[] into a Julia memory space. So my workaround actually introduces another copy operation to the already horrenduous copying (as you also wrote). Thus when considering an INDArray abstraction, this will also need good consideration as to what assumptions are to be made when converting from row major to column major when moving from one implementation to another (there are more efficient ways to do this than creating copies). So an ideal INDArray abstraction should be able to take the most efficient path in each case. And similar to how my IByteBuffer still allows convenient access for converting to/from byte[] and even has a OnHeap-Implementation. I would also want to create convenience functions there to transform the INDArray into double[] or double[][] (or even use an INDArray implementation that has that as its underlying storage) in order to properly and easily integrate with commons-math or smile where different assumptions are made.

commons-math or smile for double[]/double[][] or tablesaw for a specific different storage being equivalents to various databases that I have integrated e.g. into ezdb where some databases support various buffer implementations, others only support byte[].

With an abstraction like that I can then use existing frameworks conveniently (some being more expensive than others to convert into). Though to get the maximum speed I might still consider to reimplement the things on top of my own abstractions.

So in summary I am completely with you in that argument. And it is something that I would like to improve at some point. The above is the plan that I have been thinking about for a while now.

subes commented 2 years ago

Here the example for the additional copy that I have to do because libjulia-clj is biased by converting from row major into column major:

    @Override
    public void putDoubleMatrix(final String variable, final double[][] matrix) {
        IScriptTaskRunnerJulia.LOG.debug("> put %s", variable);
        final int cols = matrix[0].length;
        final int rows = matrix.length;
        final double[] vector = new double[rows * cols];
        int i = 0;
        for (int c = 0; c < cols; c++) {
            for (int r = 0; r < rows; r++) {
                vector[i] = matrix[r][c];
                i++;
            }
        }
        final Object array = libjulia_clj.java_api.createArray("float64", new int[] { cols, rows }, vector);
        putGlobalFunction.invoke(variable, array);
    }

If this conversion could be disabled, libjulia-clj could directly copy the data from the double[][] in the correct format which would be more efficient without an intermediary double[].

subes commented 2 years ago

Though still needing the double[][] or double[] in the first place is bad for something like INDArray. That abstraction could use convenience methods to go that path, but more efficient would be a path that reads the content from shared memory into julia without copying (or the least amount of copying as possible; e.g. with arrow on both sides), or worst case going with a file transfer. That is something that an INDArray abstraction could create without exposing such details to the user.

subes commented 2 years ago

The INDArray abstraction should also implement an equals/hashcode/tostring that produces the same results regardless of the underlying implementation.

cnuernber commented 2 years ago

Interesting - So far I avoided hashing because of issues such as (hash (float 2)) != (hash (double 2)) and same with equality; for floating point numbers equality doesn't make so much sense as a subtraction with an error threshold.

subes commented 2 years ago

Sure, hash of float will be different to hash of double. I would not try to make it common at that level. Just common between any implementation of float and any implementation of double separately.

Also what goes against an INDArray abstraction right now is that we don't have generics for primitives in Java. So we either have to build one INDArray that can convert from/to all other types (like IByteBuffer). Or have INDArrayDouble, INDArrayFloat, INDArrayInt and so on. Deciding for the first approach for IByteBuffer is simple because the underlying storage is always some form of byte[]. Though there are also some implementations of DoubleBuffer, IntBuffer and so on that I currently don't support. But for an INDArray that would have to be solved in a good way.

subes commented 2 years ago

Also regarding base implementations, I was also looking at nd4j for a while now: https://deeplearning4j.konduit.ai/ It also supports GPUs and the last time I looked at it, it supported both OpenBLAS and MKL. So it might also provide some benefits in comparison to tech.ml.dataset and another argument why an abstraction would be best to pick the implementation that best suits a given task with simplified conversions that do it with as little overhead as possible.

subes commented 2 years ago

Funny, it seems they built a higher level abstraction on the libpython JavaCPP layer that we talked about a few days ago: https://deeplearning4j.konduit.ai/python4j/tutorials/quickstart

cnuernber commented 2 years ago

tmd supports neanderthal which has MKL support but tmd specifically is just a column-major in-memory database system - ND4J doesn't overlap much aside from having an NDBuffer definition. I haven't built a bridge to ND4J because my userbase uses neanderthal but an efficient pathway from a tech.v3.tensor datatype to nd4j NDarray datatype is definitely possible.

Because there are so many linalg libraries available for java and such I didn't implement any linalg systems aside from defining a cross-language NDBuffer implementation with a stable ABI. I stopped there.

subes commented 2 years ago

Ah ok, neanderthal also supports GPU. Seems to be the choice when you are in clojure anyhow.

I agree that there are enough linalg libraries out there (ojalgo being another one of those). Integration between each of them is lacking and the least common denominator is double[] or double[][] sadly.

cnuernber commented 2 years ago

Lol, well, you can't fight the wind :-)...

subes commented 2 years ago

neanderthal seems to be a lot better than nd4j also: https://neanderthal.uncomplicate.org/articles/benchmarks.html

cnuernber commented 2 years ago

I think if you want to you can create benchmarks that go the other way but I know Dragan and I know that he has literally spent the last 7 years of his life making neanderthal as fast as possible for specifically the common use cases that users run into. I also have a relationship with him so if you want something specifically to be faster I could probably make it happen very quickly.

When I built cortex I used direct cudnn bindings and got blistering speed but if I were going to do the same again I would most likely use neanderthal along with his deep diamond nd tensor library. My first version was built against javaCPP and I had about 1 million issues with the design of the library of things that weren't threadsafe that should have been to issues of complete overdesign - specifically he had a garbage collector loop where he was measuring the amount of native memory you were allocating and if it went above the JVM's runtime threshold he would start manually calling System.gc() in a loop in a thread thus destroying performance. Imagine now you have a neural network such as resnet that needed literally several GB of memory to run - on the GPU - and finding out that the library you are using to bind to native libraries is spinning the GC trying to make the GPU memory fit under the JVM memory threshold. IT was just stupid and and very hard to find. So I have extremely bad experiences with javaCPP although I do like the concept.

cnuernber commented 2 years ago

All that aside, using JNA is a philosophical difference not just a choice due to bad past experiences. Dynamically binding to native libraries when possible means you cut down on the required versions of libraries involved as if you are careful one dynamic binding can work with several different versions of the same native library. This is huge when you are talking about binding to specific cudnn versions and potentially specific cuda versions and you are trying to, e.g., do satellite imagery analysis where you don't control the runtime environment and thus the CUDA versions on that environment.

subes commented 2 years ago

lolz, that is indeed funny. Could be posted to thedailywtf. ^^

Similar story: https://github.com/burningwave/core/issues/10 tl;dr: provides convenient way to turn off the java module system (useful during development). Though the "framework for frameworks" spun up threads that watched some temp directories to delete in a tight loop.