clj-python / libpython-clj

Python bindings for Clojure
Eclipse Public License 2.0
1.06k stars 69 forks source link

Make JNA Binding available to Java clients #191

Closed subes closed 2 years ago

subes commented 2 years ago

As discussed here: https://github.com/cnuernber/libjulia-clj/issues/3 Please generate some Java classes for libpython-clj so I can integrate it here: https://github.com/invesdwin/invesdwin-context-python/tree/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj

Thanks a lot!

cnuernber commented 2 years ago

Now that is interesting - didn't expect that error.

subes commented 2 years ago

I can't find a way to extract __builtins__ or exec via getAttr. I tried a few ways. Maybe you can create a code example?

subes commented 2 years ago

Also I think I will need exec because otherwise my arbitrary code which might define global variables will instead define them in the local scope of the wrapper function. My hope is to escape that via a fast path for exec (which should remove the jvm side overhead as much as possible). This should then be at least as fast as Jep which uses the same approach. Interpreted python should be fine. If one wants to precompile things, just define functions and call them instead. All via exec.

cnuernber commented 2 years ago

Let's take a step back. It seems like you really want to do: loop: setGlobal setGlobal executeScript getGlobal

As opposed to loop: call(a1,a2);

This is the use case you are targeting, correct?

You are concerned that the arbitrary code will define or access variables that are in some local scope?

subes commented 2 years ago

Yes, I would like to have the same API regardless of the integration working fast enough for general testing. Using more advanced ways like the "loop: call(a1, a2)" is an optimization that can still be made manually when sticking to one specific integration (then the code will only work with libpython-clj since no other integration supports the fastcall or exported function approach).

cnuernber commented 2 years ago

Gotcha. Let's just make that pathway easier for you :-).

subes commented 2 years ago

Using makefastcallable+fastcall with a fastcallablecontext: https://github.com/invesdwin/invesdwin-context-python/commit/864d4775d3ecc48d4f95e85a6a3e55f2503affa2#diff-d171a9c47f4a6133c8d3a1351b8a14d10554860243a8a92687629a12e5d26072

Gives the following exception:

Caused by - java.lang.Exception: Item libpython_clj2.python.fn$make_fastcallable$reify__13773@d995432 is not convertible to a C pointer
        at tech.v3.datatype.ffi.ptr_value$unchecked_ptr_value.invokeStatic(ptr_value.clj:17)
        at tech.v3.datatype.ffi.ptr_value$unchecked_ptr_value.invokePrim(ptr_value.clj)
        at tech.v3.datatype.ffi.ptr_value$ptr_value.invokeStatic(ptr_value.clj:27)
        at tech.v3.datatype.ffi.ptr_value$ptr_value.invokePrim(ptr_value.clj)
        at tech.v3.datatype.ffi.jna$ptr_value.invokeStatic(jna.clj:65)
        at tech.v3.datatype.ffi.jna.G__15248.PyObject_CallObject(Unknown Source)
        at tech.v3.datatype.ffi.jna.G__15248$invoker_PyObject_CallObject.invoke(Unknown Source)
        at libpython_clj2.python.ffi$PyObject_CallObject.invokeStatic(ffi.clj:458)
        at libpython_clj2.python.ffi$PyObject_CallObject.invoke(ffi.clj:458)
        at libpython_clj2.python.fn$fastcall.invokeStatic(fn.clj:337)
        at libpython_clj2.python.fn$fastcall.invoke(fn.clj:337)
        at clojure.lang.Var.invoke(Var.java:393)
        at libpython_clj2.java_api$_fastcall.invokeStatic(java_api.clj:278)
        at libpython_clj2.java_api$_fastcall.invoke(java_api.clj:264)
        at libpython_clj2.java_api.fastcall(Unknown Source)

Though I think this is because makefastcallable should never be used with fastcall. makefastcallable is only to be used with call. fastcall can only be used with functions that one gets via globals.get(...). At least that is my current hypothesis.

cnuernber commented 2 years ago

That is correct.

I have a new release out (2.009). I tested a fast pathway that is I think what you are looking for. There is no longer the runString method as for Python you can specify that the string is an expression that should return a value.

So there new functions:

The exact test I ran was: loop: setGlobal setGlobal runStringAsInput

Timings (on my laptop):

Python fn calls/ms 1073.8821824248573
Python fastcall calls/ms 2581.553321769592
Python fastcallable calls/ms 2573.4040936080783
Python setglobal pathway calls/ms 2035.7445349150353

On laptop with manual gil management enabled:

libpython-clj2.java-api-test> (base-japi-test)
Python fn calls/ms 1307.0113225201492
Python fastcall calls/ms 2578.096903779094
Python fastcallable calls/ms 2770.40991109713
Python setglobal pathway calls/ms 1904.3004365439267
nil
libpython-clj2.java-api-test> (base-japi-test)
Python fn calls/ms 1186.4701643639426
Python fastcall calls/ms 3172.0441707658306
Python fastcallable calls/ms 3088.527501521347
Python setglobal pathway calls/ms 2380.1786812036876

on Travis:

Python fn calls/ms 656.5465952937315
Python fastcall calls/ms 2274.610608954931
Python fastcallable calls/ms 2284.8729615431616
Python setglobal pathway calls/ms 1613.6083882876983

Relative speeds will probably be similar.

subes commented 2 years ago

This looks really good!

I tried integrating it. https://github.com/invesdwin/invesdwin-context-python/commit/66d0479ef7c7ab4be6b7414e410778c82ea47225

In the docs the function signature is like this:

  java_api.setGlobal("bid", 1);
  java_api.setGlobal("ask", 2);

But setGlobal/getGlobal have one more unexpected parameter.: grafik

Also when I call: libpython_clj2.java_api.runStringAsInput(""); or libpython_clj2.java_api.runStringAsInput("1+1"); the JVM crashes.

I am using version 2.009 with Python 3.9.7.

cnuernber commented 2 years ago

Sorry, I declared the fn signatures incorrectly for that. Looking to replicate the runStringAsInput crash. You are calling all of these with the lock held, correct?

subes commented 2 years ago

Just checked, yes the GIL lock was missing. Sorry :D

subes commented 2 years ago

Ok, now I got what is to be expected: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/UncheckedPythonEngineWrapper.java

2022-01-02 01:42:56.654 [ |7-10:InputsAndResul] ERROR de.invesdwin.ERROR.process                                   - processing #00000010
de.invesdwin.context.log.error.LoggedRuntimeException: #00000010 clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.java-api/-setGlobal
        ... 13 omitted, see following cause or error.log
Caused by - clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.java-api/-setGlobal
        at clojure.lang.AFn.throwArity(AFn.java:429)
        at clojure.lang.AFn.invoke(AFn.java:40)
        at libpython_clj2.java_api.setGlobal(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.set(UncheckedPythonEngineWrapper.java:73) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:43) *

I will optimize gil lock management further so that no threadlocal is used. Having it completely in my hands makes it easier now.

cnuernber commented 2 years ago

Release 2.010 is up and fixed the setglobal pathway.

subes commented 2 years ago

The signature looks good, though calling it causes the following error now:

2022-01-02 01:59:05.361 [ |7-1:InputsAndResult] ERROR de.invesdwin.ERROR.process                                   - processing #00000001
de.invesdwin.context.log.error.LoggedRuntimeException: #00000001 java.lang.ClassCastException: class [B cannot be cast to class java.lang.Throwable ([B and java.lang.Throwable are in module java.base of loader 'bootstrap')
        ... 13 omitted, see following cause or error.log
Caused by - java.lang.ClassCastException: class [B cannot be cast to class java.lang.Throwable ([B and java.lang.Throwable are in module java.base of loader 'bootstrap')
        at libpython_clj2.python.ffi$untracked__GT_python.invokeStatic(ffi.clj:622)
        at libpython_clj2.python.ffi$untracked__GT_python.doInvoke(ffi.clj:603)
        at clojure.lang.RestFn.invoke(RestFn.java:410)
        at clojure.lang.Var.invoke(Var.java:384)
        at libpython_clj2.java_api$fn__176$fn__177.invoke(java_api.clj:140)
        at libpython_clj2.java_api$_setGlobal.invokeStatic(java_api.clj:247)
        at libpython_clj2.java_api$_setGlobal.invoke(java_api.clj:243)
        at libpython_clj2.java_api.setGlobal(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.set(UncheckedPythonEngineWrapper.java:85) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putByteVector(LibpythoncljScriptTaskInputsPython.java:28) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTestByte$1.populateInputs(InputsAndResultsTestByte.java:61) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:43) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTestByte.testByte(InputsAndResultsTestByte.java:99) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:24) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
      * at de.invesdwin.util.concurrent.internal.WrappedThreadFactory.lambda$0(WrappedThreadFactory.java:44) *
        ... 2 more, see error.log
subes commented 2 years ago

I digged a bit deeper. setGlobal works for primitive types and Strings. When giving it Arrays (in this example a byte[]) the exception is thrown.

cnuernber commented 2 years ago

A primitive array is a completely logical thing to setGlobal. That would end up as a numpy array of int8, is that reasonable to you?

subes commented 2 years ago

With the previous way:

final Map<?, ?> mainModule = libpython_clj2.java_api.runStringAsFile("");
globals = (Map<Object, Object>) mainModule.get("globals");
globals.put/get

This was turned into the specific types appropriate to the java type (see unit tests for each type in invesdwin-context-python-contract). The testcases were green with that.

It would be great if setGlobal(...) does the same type conversion as globals.put(...) did before. The above code to get the globals does not work anymore since mainModule.get("globals") returns null. Though this is not a problem as long as setGlobal/getGlobal works. So if it were numpy arrays previously, it is absolutely fine.

Also matrices byte[][] and other primitive matrix types worked with the previous solution.

cnuernber commented 2 years ago

It was. I used a super low level fastpath (untracked->python) in the setGlobal pathway - I will make it work the same in that it will bail to a more general pathway if the user doesn't pass in a python primitive type.

subes commented 2 years ago

Sounds good. :) I guess some instanceof checks won't hurt performance too much.

cnuernber commented 2 years ago

They are late. The general pathway copies the array of data into a python list which is IMO horrible in it's own special way but users can use createArray if they want something faster.

cnuernber commented 2 years ago

The thing is a byte[][] isn't a matrix type - it could be ragged. A byte[] combined with a shape such as an int array is a matrix type.

subes commented 2 years ago

Ok, I should look into using createArray anyhow. I also wanted to use that in the libjulia-clj integration. But did not go for that optimization yet.

cnuernber commented 2 years ago

I am setting up things to auto-call that in setGlobal for primtive arrays and primitive arrays-of-arrays in the case where the array-of-array has a constant inner length.

There is also a copyData call now so if you have an allocated numpy array you can quick copy an appropriately typed flat array of data into it. That copy pathway will not work with array-of-arrays.

New release is up - 2.012. This contains copyData and an updated setGlobal pathway that will convert simple arrays automagically into numpy arrays via the createArray pathway.

subes commented 2 years ago

I tried it a bit, but I found no good way to make numpy arrays as the default work across engines.

I think it would be better to also make libjuliy-clj transmit lists per default. And maybe have some way to opt in to automatic numpy transmission (could be a system property like "libpython_clj.manual_gil" or a setGlobalNumpy(...) or setGlobal(key, value, numpy=true) overload. I think it would be best to use the least common denominator here (even if it might be an interesting optimization) because it changes the semantics in unexpected ways when trying to reuse scripts. One could also imagine restricted environments or offline installations that don't have numpy support or can not install numpy.

Or just let users explictly use createArray for the numpy fastpath. Though it should be opt-in to use numpy.

Also here a benchmark that shows that appending data is faster with python lists: https://towardsdatascience.com/python-lists-are-sometimes-much-faster-than-numpy-heres-a-proof-4b3dad4653ad I would also guess that python lists could be faster than numpy for small sized lists/arrays (since native calls are not needed, dunno if it is similar expensive to JNI calls)?

Here another benchmark with random numbers: https://towardsdatascience.com/is-numpy-really-faster-than-python-aaa9f8afb5d7 It seems the breakeven is at about 20 elements for this specific case. grafik

cnuernber commented 2 years ago

I could see lists working better for small things for sure - very small. I like the opt-in approach honestly the most as it is the simplest from my end and the opt-in pattern works in general. I think for like 99% of the use cases that are meaningful it will be the slowest one, especially if you are running a model or something as you have to go to numpy anyway but as you said the users can solve that.

The deeper integration and especially zero-copy are distinct advantages of using libpython-clj or libjulia-clj but they are specializations. And the fastest pathway would involve preallocating a set of numpy arrays and copying into them repeatedly. Not just recreating them via setting globals.

I am fine backing off and going back to lists for setGlobal - that is at least standardized in terms of it will behave the same with function calls.

cnuernber commented 2 years ago

Release 2.013 is up that disables the auto-numpy pathway of setGlobal.

subes commented 2 years ago

setGlobal now works great. Though runStringAsInput returns a Pointer {:address 0x00007F35520C97C0 } instead of a byte[] or byte[][] now. Though the workaround as demonstrated below works:

@Override
    public Object get(final String variable) {
        IScriptTaskRunnerPython.LOG.debug("get %s", variable);
        gilLock.lock();
        try {
            //does not work due to pointer being returned
            //            return libpython_clj2.java_api.runStringAsInput(variable);
            //workaround works
            libpython_clj2.java_api.runStringAsFile("__ans__ = " + variable);
            return libpython_clj2.java_api.getGlobal("__ans__");
        } finally {
            gilLock.unlock();
        }
    }

Would be great if we can get the conversions of getGlobal also into runStringAsInput. So I don't need the extra call to have the testcases green.

cnuernber commented 2 years ago

Sure, also to have a consistent API. You are running a bit into something that libpython-clj supports in that it by default allows both proxying python objects to java in addition to copying them. runStringAsFile wasn't wrapping the return value correctly.

What getGlobal and runStringAsFile do is they return proxied objects. There is another api fn, copyToJVM, that will always ensure the data is the in the JVM in the correct format. This is optimized for the case of returning lists or returning things such as nested json objects. In any case both will return an implementation of java.util.List whether it is proxied or not.

2.014 is up and contains a fix for runStringAsInput.

subes commented 2 years ago

This works great now. I updated the benchmarks: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#results

And here the updated benchmark for the fastcallable function optimization:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private AutoCloseable calcSpread;
    private ILock gilLock;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();

        pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
        gilLock = pythonEngine.getSharedLock();
        gilLock.lock();
        final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
        final IFn calcSpreadFunction = (IFn) unwrap.get("calcSpread");
        calcSpread = libpython_clj2.java_api.makeFastcallable(calcSpreadFunction);

        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        final double pythonSpread = Doubles.checkedCast(
                libpython_clj2.java_api.call(calcSpread, lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            try {
                calcSpread.close();
            } catch (final Exception e) {
                throw new RuntimeException(e);
            }
            gilLock.unlock();
            pythonEngine.close();
            pythonEngine = null;
        }
    }

}

333.32/ms python calls with 271.22/ms ticks (about 3x more ticks per second due to less python calls per tick)

subes commented 2 years ago

Here a benchmark for keeping GIL locked without the fastcallable function:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private ILock gilLock;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();
        gilLock = pythonEngine.getSharedLock();
        gilLock.lock();
        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        pythonEngine.getInputs().putDouble("ask", lastTick.getAskAbsolute());
        countPythonCalls++;
        pythonEngine.getInputs().putDouble("bid", lastTick.getBidAbsolute());
        countPythonCalls++;
        pythonEngine.eval("spread = abs(ask-bid)");
        countPythonCalls++;
        final double pythonSpread = pythonEngine.getResults().getDouble("spread");
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            gilLock.unlock();
            pythonEngine.close();
            pythonEngine = null;
        }
    }

}

614.23/ms python calls with 139.61/ms ticks

subes commented 2 years ago

This is 2-3 times faster than Jep. So really good job there! You definitely improved the available python integration landscape for the JVM. This is a win for the java community and a great achievement for you personally. :)

The only reason to use Jep instead of libpython-clj is now when one wants to do multithreading with sub-interpreters. That is the only situation where Jep can be faster right now. I guess each interpreter should have its separate GIL. Though according to this, multiple interpreters seem to share the same GIL: https://github.com/ninia/jep/wiki/Jep-and-the-GIL

Also I am looking into integrating clojure via https://github.com/ato/clojure-jsr223 to make libraries like scicloj or tech.ml.dataset available. With that there might be no need to write a java binding for these libraries. Apart from allowing usage by people that don't want to write clojure. Though dunno how the clojure ScriptEngine performs. I guess if it reuses the compiled scripts it should work ok.

subes commented 2 years ago

Just found another alternative to integrate python via JavaCPP (https://github.com/bytedeco/javacpp-presets/tree/master/cpython). They also have a sort-of integration for scipy: https://github.com/bytedeco/javacpp-presets/tree/master/scipy

Though don't know how useable that is. Seems very low level.

cnuernber commented 2 years ago

tech.ml.dataset is a pandas-equivalent - that doesn't exist on the JVM. It has extremely fast grouping aggregations and joins. The issue with making a java api for it is that its api is very broad - as is pandas or dplyr or such. You would end up with I think hundreds of java functions and I think without someone definitively saying they would attempt it and make the javadoc nice and everything it would just be a ton of work when you can just use things from Clojure and be done with it.

scicloj and the ml subsystem I think are a lot more generally useful but you need tmd for those... In your space scicloj allows people to efficiently use xgboost which is a damn fast and pretty good general model for lots of problems. In addition you can use any mode from scipy or smile (before GPL - I didn't know that!) but that is in general setting up a new ML community with docs and everything and we tried it and it is a ton of work and again, what is the value-add of going to Java when the system is designed for Clojure anyway? You are talking like years of work or at least a year when you could just learn Clojure, do what you need to and be done with it.

My concern with javacpp is and always has been hardcoding it to a specific python version or environment. Perhaps it finds Python but I don't think so and the system we built over a lot of time and tears to find python given various python environments such as pyenv and conda means a lot of things just work that absolutely drive you crazy with other python integrations.

Thanks for you patience with this! I love these types of benchmarks but it always takes a while to figure them out and put the best foot forward! This is truly a fantastic issue and now libpython-clj has a solid Java API and we know it is well thought out and efficient. That is a big step forward. The Java API has features in it, btw, that the public Clojure one doesn't specifically the fast execution of scripts and manual GIL control so that is interesting.

subes commented 2 years ago

Smile is still LGPL, I checked again and the files in smile-core for example still contain the LGPL header. I think they only put specific parts under GPL (something called SIMLE Shell I think, but did not check further). The main license file in the repo is now just a bit confusing because it gives the impression that it is GPL. The website or documentation does not tell what the license situation is.

And yes, I agree that using clojure integration is easier. I don't know if the API will even translate well.

cnuernber commented 2 years ago

As far as a step forward for the JVM, I think also the julia integration is key. The JVM just doesn't create great vectorized code and that is something the Julia compiler does very well. Julia really is complementary to the JVM while Python is interesting solely due to the libraries; anything done in Python could be done in the JVM while for Julia that isn't true. For some really out there stuff check out kmeans-mnist. The implementation is a really tight integration passing the data via zero-copy between Julia and the JVM and using each where each has the strongest advantage sort of weaving between the two of them.

I also tried out TVM for a while but they just aren't general enough to really take the JVM forward. Julia is, however.

For a fantastic paper on optimizing literally any computational problem I think this paper really hits the nail on the head specifically pages 34-43.

subes commented 2 years ago

Seems to be again wrong, Version 2.6.0 is LGPL: grafik

The current snapshot version is GPL: https://github.com/haifengl/smile/blob/master/core/src/main/java/smile/association/AssociationRule.java grafik

cnuernber commented 2 years ago

Yep, Haifeng just wants to get paid for his work and hasn't figured out how to do that. Hopefully he finds success via dual licensing but I think that is pretty tough.

cnuernber commented 2 years ago

In fact, if I could get a group of people together I would love to get a JVM version of Julia working using JDK-17's vector intrinsics. I think it is possible to now to equal the speed of other systems especially if we can get the LLVM vectorized-bytecode and convert it to JDK vector intrinsics. So you have the same julia compiler frontend and just have it output optimized JVM bytecode.

cnuernber commented 2 years ago

oh its all LGPL - that is huge. Thanks for finding that out that means upgrading is OK :-).

subes commented 2 years ago

Why is upgrading ok when they switch from LGPL to GPL? Upgrading to 2.6.0 is fine, beyond that not anymore.

cnuernber commented 2 years ago

Sorry, I misread it. You are right, GPL is a bad deal for most businesses who are trying to make money from software. Yep, thanks, I think we are at 2.6.0.

subes commented 2 years ago

Regarding something like Renjin for Julia might be an interesting idea. Though Renjin has demand because R is horribly slow. Julia does not seem to be so slow. Dunno if people also use JRuby because of the speed improvements.

Though being able to mix julia code with java directly could be beneficial.

cnuernber commented 2 years ago

I think it could be interesting. We can now, just through libjulia-clj. But it has some drawbacks in that Julia does things at times with the callstack so callbacks back into the JVM don't always work regardless of the integration.

Julia's compiler is about 100 times more involved than R's so I think trying to rebuild their compiler is a minefield but LLVM used to have a JVM bytecode generation pathway and it could always be rebuilt. With JDK-17 vector intrinsics or even without some of those optimizations are pretty powerful.

subes commented 2 years ago

I guess you mean that bytecode generator: https://github.com/davidar/lljvm Or is there a more up to date one?

How does TVM compare to tornadoVM, aparapi or rootbeer1? Is it only for matrix functions or can one also e.g. write a backtesting engine on the GPU with that? From what I looked at these things with automagic generation from java bytecode to opencl/cuda one always has some quirkyness or incomprehensible problems. Since now I though that a better approach might be to code a opencl/cuda program directly and integrate it on a higher level instead of trying to mix computation between CPU/GPU/ on a function level (similar to the other language integrations that I am doing). Regarding the machine learning backtests: currently I am already quite fast on the CPU with the generated strategies (which can also be easily scaled via cluster/cloud computing). I have to further work on robustness techniques to combat curve fitting for now. But at some point it will be interesting to also use the engine on tick data or larger portfolios. That is something where I think the performance could be improved by at least 10x if one goes from CPU to GPU in an intelligent way.

Regarding monetization of open source products (as e.g. haifeng might be trying now):

I will read that paper you suggested later, thanks for the link! I am always interested to read interesting things.

cnuernber commented 2 years ago

With TVM you literally program an AST and then ask TVM to compile that AST various different ways. So for something where you are controlling the code generation (generated strategy) it may work but it isn't a general programming language so it doesn't support everything you can think of. It is a very specialized programming language that keeps you within the bounds of something that can be translated to a GPU.

Those other methods (impressively!) look like they take byte code and auto-gen the GPU bindings. With TVM you have to program the AST yourself. This requires you to know the TVM micro-language and stick to it both of which are quite tough IMO.

For TVM you have three steps.

  1. Define algorithm using TVM AST.
  2. schedule algorithm which applies various transformations that do things like loop fusion and such.
  3. compile the algorithm to the hardware of your choice.

Definitely nothing automatic about it. And when I tested CPU pathways against Julia I found I was able to get more performance for at least mnist via Julia.

I was, however, able to create a simple image resize algorithm that performed better than openCV across both CPU and GPU (both CUDA and OpenCV with CUDA being notably faster) but this took a fascinating but nearly heroic effort.

TVM is specialized heavily towards optimizing neural network processing graphs so if your problem looks like a convolutional neural network then it will be better than anything else. Again my honest opinion based on my various explorations is that for general purpose processing Julia beats everything else.

cnuernber commented 2 years ago

I haven't found a better llvm wrapper and the one you point to I believe is pretty far out of date.

cnuernber commented 2 years ago

@subes - Found a tensorflow quant finance library -https://github.com/google/tf-quant-finance

Seems relevant to your interests :-)

subes commented 2 years ago

Thanks, looks interesting. Nowadays there is also a tensorflow binding for java: https://www.tensorflow.org/jvm/install I will put this into my tickets as a reminder. I wanted to integrate neural networks and deep learning into the process and add functions to train/use such algorithms in my expression language. The neural networks are used for price forecasting (I have more forecasting techniques in my backlog) or to generate artificial intelligence indicators. These can either be used as an alternative to the genetic programming I am currently doing or could be embedded into the genetic programming (to create advanced hybrid processes).

Since I already have a 3 suitable generators, I am currently focusing on other parts of the process. Currently I am working on machine learning or solver based portfolio algorithms. Among others, I have implemented these over the last few weeks and I am currently testing them in my automated processes:

I have a pitch for you: The platform is supposed to become something like WEKA but for financial trading strategies and portfolio management. It is supposed to be free for researchers, though invitation/requests only at the moment. I am working together with my university to maybe acquire funding via grants in order to get some student assistant positions for this. And it might be useful as a teaching instrument in economics courses once the UI is finished. Though this is just the more open paths that this goes, I am also following some other more proprietary alpha seeking paths with this for institutional clients. If you think you could be interested in such things, we could do a screen sharing session sometime and if you like it I can give you access. ;)

cnuernber commented 2 years ago

The problem with the java tensorflow is it doesn't include the python layers which have a ton of functionality. For instance that library, tf-finance, relies on tensorflow-probability which is a nontrivial python layer on top of tensorflow. So to get meaningful support of tensorflow on the JVM you need to run the python layers. This is also true for mxnet which I like a lot better architecturally -- their add-on systems for language processing, for example, are very nontrivial and completely done in python. pytorch is another example although for whatever reason pytorch doesn't work well with libpython-clj.

I would love a screenshare to see the app in its full glory. Do you have an email address you like to use for these things?

subes commented 2 years ago

gsubes@gmail.com or edwinstang@gmail.com is fine. I guess we could do the screen share in zoom. I am also on skype (gsubes). Just pick a time slot and a medium that suits you well.