clj-python / libpython-clj

Python bindings for Clojure
Eclipse Public License 2.0
1.05k stars 68 forks source link

Make JNA Binding available to Java clients #191

Closed subes closed 2 years ago

subes commented 2 years ago

As discussed here: https://github.com/cnuernber/libjulia-clj/issues/3 Please generate some Java classes for libpython-clj so I can integrate it here: https://github.com/invesdwin/invesdwin-context-python/tree/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj

Thanks a lot!

cnuernber commented 2 years ago

This is great :-). Is this going to be used in the same classloader as the libjulia-clj java bindings?

The reason I ask is because if so, then I need to release a dtype-next aot version and make the libjulia and this python version dependent on that so that we don't duplicate class files all over the place - the libjulia aot version I made had the required dtype-next class files generated inline.

For the first test I can just regenerate the dtype-next classes but after we confirm this is working I will need to release aot versions of dependent libraries if this is going to be used in the same classloader as the libjulia-clj java bindings.

cnuernber commented 2 years ago

Initial jar - https://clojars.org/clj-python/libpython-clj/versions/2.004-aot API docs - https://clj-python.github.io/libpython-clj/libpython-clj2.java-api.html

The initialize function calls the python executable from command line and gets it to output the setup information. Python has a bit more involved setup than just a shared library path as it also needs to know some level of module root path. Let's see if this works and we can tweak.

Small unit test showing functionality - https://github.com/clj-python/libpython-clj/blob/master/test/libpython_clj2/java_api_test.clj

subes commented 2 years ago

Thanks a lot, I will test it as soon as possible.

subes commented 2 years ago

I made some progress here: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/PythonEngine.java

Though I can't figure out how to get/set globals. I get this exception:

2021-12-28 14:59:13.994 [ |7-7:InputsAndResult] ERROR de.invesdwin.ERROR.process                                   - processing #00000007
de.invesdwin.context.log.error.LoggedRuntimeException: #00000007 clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        ... 12 omitted, see following cause or error.log
Caused by - clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        at clojure.lang.AFn.throwArity(AFn.java:429)
        at clojure.lang.AFn.invoke(AFn.java:40)
        at libpython_clj2.java_api$_setItem.invokeStatic(java_api.clj:106)
        at libpython_clj2.java_api$_setItem.invoke(java_api.clj:104)
        at libpython_clj2.java_api.setItem(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.PythonEngine.set(PythonEngine.java:48) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:42) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        ... 8 more, see error.log

getAttr/setAttr also does not work:

2021-12-28 15:03:03.024 [ |7-2:InputsAndResult] ERROR de.invesdwin.ERROR.process                                   - processing #00000002
de.invesdwin.context.log.error.LoggedRuntimeException: #00000002 java.lang.Exception: AttributeError: 'dict' object has no attribute 'hello'

        ... 12 omitted, see following cause or error.log
Caused by - java.lang.Exception: AttributeError: 'dict' object has no attribute 'hello'

        at libpython_clj2.python.ffi$check_error_throw.invokeStatic(ffi.clj:687)
        at libpython_clj2.python.ffi$check_error_throw.invoke(ffi.clj:685)
        at libpython_clj2.python.base$fn__10297.invokeStatic(base.clj:144)
        at libpython_clj2.python.base$fn__10297.invoke(base.clj:114)
        at libpython_clj2.python.protocols$fn__10100$G__10095__10109.invoke(protocols.clj:57)
        at libpython_clj2.python.bridge_as_jvm$generic_python_as_map$reify__10853$fn__10856.invoke(bridge_as_jvm.clj:281)
        at libpython_clj2.python.bridge_as_jvm$generic_python_as_map$reify__10853.set_attr_BANG_(bridge_as_jvm.clj:281)
        at libpython_clj2.python$set_attr_BANG_$fn__11372.invoke(python.clj:199)
        at libpython_clj2.python$set_attr_BANG_.invokeStatic(python.clj:199)
        at libpython_clj2.python$set_attr_BANG_.invoke(python.clj:196)
        at libpython_clj2.java_api$_setAttr.invokeStatic(java_api.clj:85)
        at libpython_clj2.java_api$_setAttr.invoke(java_api.clj:82)
        at libpython_clj2.java_api.setAttr(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.PythonEngine.set(PythonEngine.java:48) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:42) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        ... 8 more, see error.log

I also tried just using "globals" as a string instead of using the returned value from the persistent map from runString, but that fails also:

2021-12-28 15:04:17.408 [ |7-10:InputsAndResul] ERROR de.invesdwin.ERROR.process                                   - processing #00000010
de.invesdwin.context.log.error.LoggedRuntimeException: #00000010 clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        ... 12 omitted, see following cause or error.log
Caused by - clojure.lang.ArityException: Wrong number of args (3) passed to: libpython-clj2.python/get-item
        at clojure.lang.AFn.throwArity(AFn.java:429)
        at clojure.lang.AFn.invoke(AFn.java:40)
        at libpython_clj2.java_api$_setItem.invokeStatic(java_api.clj:106)
        at libpython_clj2.java_api$_setItem.invoke(java_api.clj:104)
        at libpython_clj2.java_api.setItem(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.PythonEngine.set(PythonEngine.java:48) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskInputsPython.putString(LibpythoncljScriptTaskInputsPython.java:57) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript$1.populateInputs(HelloWorldScript.java:29) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:42) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        ... 8 more, see error.log
subes commented 2 years ago

Ah, this is a lot simpler than expected, one just has to put/get from/into the PersistentMap.

cnuernber commented 2 years ago

True, but there isn't a way to get the globals map in a stand-alone fashion which I can see would be useful. And you did find an error in the api :-).

I tried to expose the python objects as their java equivalents so a python dict will be returned as an implementation of java.util.Map and tuples/lists implement java.util.List and java.util.RandomAccess, etc. There is a somewhat complex pathway in there in Clojure for if you want to copy a python value completely into the JVM such as copying a JSON object or if you want to bridge/proxy it like what people want to do with modules.

cnuernber commented 2 years ago

Also I want to expose a withGIL function so you can capture the GIL once and do a set of things. This is similar to inContext with the exception that it doesn't attempt to release all objects allocated within the scope.

subes commented 2 years ago

This is how I get the globals right now:

    @SuppressWarnings("unchecked")
    private Map<Object, Object> getGlobals() {
        final Map<?, ?> mainModule = libpython_clj2.java_api.runString("");
        return (Map<Object, Object>) mainModule.get("globals");
    }
cnuernber commented 2 years ago

Yep, that will work.

subes commented 2 years ago

And yes, withGIL would be great. Though a lock/unlock function would be better so I can do:

gilLock.lock();
try{
...
} finally{
    gilLock.unlock();
}
cnuernber commented 2 years ago

OK, makes sense. Then you can make your own withGIL if you need it...

subes commented 2 years ago

I would just wrap it in an implementation of ILock (which my client code already uses).

cnuernber commented 2 years ago

New API is up - it has two new functions for GIL management (as well as a fixed setItem call) - https://clojars.org/clj-python/libpython-clj/versions/2.004-aot-1.

Note that the python ensureGIL call returns an integer that must be passed into releaseGIL so ensureGIL is 'reentrant' but you would have to keep track of that integer in your ILock impl.

New api fns are int lockGIL() and void unlockGIL(int).

subes commented 2 years ago

Why doesn't this:

final Map<String, Object> initParams = new HashMap<>();
initParams.put("python-executable", "someWrongCommand");
libpython_clj2.java_api.initialize(initParams);

throw an exception? I have no PYTHON_HOME env variable set. Also I guess python2 and pypy are not supported?

cnuernber commented 2 years ago

Good question. It should throw. Will check in a moment.

For sure python2 isn't supported and probably not pypy as I am sitting directly on the shared library. Also not all python distributions even come with the shared library, a lot of them just compile the symbols into the python executable which is why we have an embedded pathway so you can boot up your system from python itself.

On top of that there are like 5 package managers for python and various ones require various tweaks - We have collected what we know into the environments document.

cnuernber commented 2 years ago

New release is up (as is my son so I have to run for a bit :-) ) - https://clojars.org/clj-python/libpython-clj/versions/2.005

This release does runtime AOT so you should see the same 4 seconds as the libjulia version. One pathway we have optimized fairly thoroughly is taking a large nested JSON object and converting it to a JVM datastructure. Aside from that I will be curious as to how it libpython-clj stacks up.

Pantera was built against a much older version of libpython-clj2. I can reach out to Alan and see if he is interesting in taking it further but there is not use case where it is faster than tech.ml.dataset to my knowledge and many where it is slower although especially for your users there are so many libraries available for pandas especially for quant stuff that may make the difference.

Here is a quick test of the system from the java api perspective.

subes commented 2 years ago

The new release is missing the java class libpython_clj2.java_api: grafik

cnuernber commented 2 years ago

Sorry - I made a mistake in the jar definition! I checked and the new release has the class files - https://clojars.org/clj-python/libpython-clj.

subes commented 2 years ago

2.006 works, thanks!

Would it be possible to sandbox interpreters in libpython-clj like it is possible in Jep (http://ninia.github.io/jep/javadoc/3.9/jep/SubInterpreter.html)? The goal would be to be able to pool interpreter instances. In Jep one has to additionally bind them to a thread or else one gets issues with GIL. Though the idea would be to have separate globals state per interpreter and do multithreading from java. Python could itself decide which thread needs access to the GIL. Native python modules like numpy do most of their work without having a lock on GIL, so letting the python code decide when to acquire the GIL in a finer granularity could be faster for multi-threading with multiple interpreters.

Currently when I disable exclusive locking with libpython-clj the state between threads gets mixed which makes the testcases red (since I guess all use the same interpreter and share globals). So threads have to use the python binding one after the other. The design doc talks about multiple interpreters, but I don't see an API to use that: https://clj-python.github.io/libpython-clj/design.html

Would also be interesting if a sandboxing like this could be possible with libjulia-clj.

subes commented 2 years ago

Regarding panthera, scicloj.ml or a future binding for keras:

Doesn't libpython-clj support generating clojure bindings for python APIs? Maybe this generator could be extended to also generate bindings for java? Or are those generated bindings not high level enough and that is the reason why someone needs to package them as panthera or scicloj.ml with some manually coded sugar? The question also goes into the direction: should a java binding be made for the panthera/scicloj.ml like you did for libjulia-clj and libpython-clj with a generated static wrapper/facade, or can this be done at a different layer?

cnuernber commented 2 years ago

These are great thoughts, glad we have the start of something going.

SubInterpreters

Your explanation makes sense to me. I have stayed away from this pathway as without anyone asking for it as it causes some level of operational risk. If there are multiple interpreters and someone does Object np = java_api.importModule( "numpy" ); then there is a question as to which interpreter they are talking to and it would be easy for someone to then call a method with data from the wrong interpreter. I agree with your reasoning and it matches my understanding -- my prediction is that it takes a somewhat more advanced user to be able to get any concrete gains with that method.

That design document was written very early and the very first version of libpython-clj contained some basic multi-interpreter support but no one used it and the way they were using libpython-clj meant multiple interpreters were going to cause issues. I should correct the document.

Honestly I would love to add that as it is the kind of thing I like to do. It would take some thought and careful engineering but probably not that many lines of code.

Java Wrappers for Python Libraries

libpython-clj contains a pretty good system for generating meaningful metadata from python libraries. Upon this system we built a runtime code generation facility - require-python and the static code generation facility you mentioned.

It wouldn't take too much effort to make a java library from the same metadata and it would have good javadoc comments but it would be primarily typeless so a bit of an odd java interface. In terms of panthera it was written before the static code generation facility - the code generation now is good enough IMO that it isn't a requirement. I am not sure how member variables of python instances would translate so my guess is that for a high quality library you would still need to wrap various class types to make the member functions clear to intellisense if that is important.

Large Features Missing From Current Java Bindings

Many of the things mentioned above will need to be carefully thought through in terms their interaction with multiple interpreters.

subes commented 2 years ago

I guess such large refactorings/redesigns are not too high priority. So if you want to do them, I will test/incorporate them. Though if someone wants that functionality, he could just use jep instead of libjulia-clj. Jep also solves the modules thing via a feature called "shared modules", though also with some warnings.

I was now able to do some benchmarks: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#results

It seems the performance is rather bad because of some overhead in clojure, though more significantly some inefficient native string parsing: image

Dunno what that code exactly does, but I guess a map to lookup functions (since I guess this is what the code does) by hash could improve the performance a lot. Or maybe it is the overhead of always returning the Map<Object, Object> for the current global/local dicts. In that case maybe a second method void exec(string) could improve the speed a lot here.

cnuernber commented 2 years ago

Hmm. So one thing is we are running a script to get the global dict and not caching it which is odd.

But in general I wouldn't call into python that way. It would look more like:

   pyEngine.eval("def calcSpread(bid,ask):\n\treturn ask-bid\n\n")
   clojure.lang.IFn calcSpread = (clojure.lang.IFn)(pyEngine.getGlobal("calcSpread")
   loop:
      result = (Double)calcSpread.invoke(bid, ask)
   end-loop:

That wouldn't parse anything at all once things got going.

subes commented 2 years ago

This is just a simple example to see which library causes the most overhead. Note the text below the results: https://github.com/invesdwin/invesdwin-context-python/blob/master/README.md#solution

There are definitely better ways to write production code for this. ^^

cnuernber commented 2 years ago

Right but my point is that performance results aren't indicative as to what will actually happen because the calling convention.

cnuernber commented 2 years ago

In your profile there are things I can fix. I am not certain that anything should be calling find-pylib-fn or the sequence operator or a few things from what I see. Caching the global map would probably make a large difference and furthermore caching the conversion of the map keys to python objects and using them directly in the get would also probably be quicker. Ideally you only parse the string once and return some level of parsed thing which is something I hadn't considered before.

But all of that will still be quite a bit slower than just creating a function and calling it directly.

subes commented 2 years ago

Here a benchmark of the function convention:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private IFn calcSpread;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();

        pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
        final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
        calcSpread = (clojure.lang.IFn) unwrap.get("calcSpread");

        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        final double pythonSpread = Doubles
                .checkedCast(calcSpread.invoke(lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            pythonEngine.close();
            pythonEngine = null;
        }
    }

}

=> 182.01/ms Python calls with 104.07/ms ticks (due to libpython-clj startup being included in the ticks measure, otherwise it should be 1-1)

image

subes commented 2 years ago

And here another version that keeps the GIL-lock all the time:

public class PythonStrategy extends StrategySupport {

    private final String instrumentId;
    private IScriptTaskEngine pythonEngine;
    private ITickCache tickCache;
    private int countPythonCalls = 0;
    private Instant start;
    private Instant lastLog;
    private IFn calcSpread;

    public PythonStrategy(final String instrumentId) {
        this.instrumentId = instrumentId;
    }

    @Override
    public void onInit() {
        tickCache = getBroker().getInstrumentRegistry()
                .getInstrumentOrThrow(instrumentId)
                .getDataSource()
                .getTickCache();
    }

    @Override
    public void onStart() {
        //        pythonEngine = Py4jScriptTaskEnginePython.newInstance();
        //        pythonEngine = JythonScriptTaskEnginePython.newInstance();
        //        pythonEngine = JepScriptTaskEnginePython.newInstance();
        pythonEngine = LibpythoncljScriptTaskEnginePython.newInstance();

        GilLock.INSTANCE.lock();
        pythonEngine.eval("def calcSpread(bid,ask):\n\treturn abs(ask-bid)\n\n");
        final IPythonEngineWrapper unwrap = (IPythonEngineWrapper) pythonEngine.unwrap();
        calcSpread = (clojure.lang.IFn) unwrap.get("calcSpread");

        start = new Instant();
        lastLog = new Instant();
    }

    @Override
    public void onTickTime() {
        final ATick lastTick = tickCache.getLastTick(null);
        final double pythonSpread = Doubles
                .checkedCast(calcSpread.invoke(lastTick.getAskAbsolute(), lastTick.getBidAbsolute()));
        countPythonCalls++;
        Assertions.checkEquals(lastTick.getSpreadAbsolute(), pythonSpread);
        if (lastLog.isGreaterThan(Duration.ONE_SECOND)) {
            //CHECKSTYLE:OFF
            System.out.println("Python Calls: " + new ProcessedEventsRateString(countPythonCalls, start.toDuration()));
            //CHECKSTYLE:ON
            lastLog = new Instant();
        }
    }

    @Override
    public void onStop() {
        if (pythonEngine != null) {
            pythonEngine.close();
            pythonEngine = null;
        }
        GilLock.INSTANCE.unlock();
    }

}

204.62/ms Python calls with 134.98/ms ticks

image

subes commented 2 years ago

The PyGILState_Check seems to still cause significant overhead despite having the lock the whole time. Does it use a ThreadLocal for this?

cnuernber commented 2 years ago

No I call that directly. I agree it causes a lot more overhead than I would think. Also RT.nthFrom is a bad sign - that is definitely an indication of an issue somewhere.

Now, what happens if you just create two numpy objects from double arrays and return a numpy object from a single call of calcSpead?

subes commented 2 years ago

do you have some sample code to do that? Also even ThreadLocal is quite slow in lots of cases.

cnuernber commented 2 years ago

Yes for sure. The createarray function will do that and I think everything else is unchanged. Because the numpy arrays overload the - operator.

I screwed up the function documentation. Something like (capture-gil):

   int[] shape = new int[numIters];
   bid = new double[numIters];
   ask = new double[numIters];
   Object result = calcSpread.invoke(java_api.createArray("float64", shape, bid), java_api.createArray("float64", shape, ask));
   java_api.arrayToJVM(result);
subes commented 2 years ago

Now I get what you mean. That approach would be useful for vector based strategy tests (which makes the strategy developer deal with lookahead bias and would require the strategy to be recoded for live trading). Would also require to have lots of ticks in memory which is unsuitable for a live trading capable event based strategy. But surely that would have the least overhead since one would only call once into python for the whole dataset, so there is no real overhead. This follows the "Solution" that I linked above which should make it more or less irrelevant how slow the language integration is (thus which integration is being used). But I am sure libpython-clj could have the best performance here since it uses zero copy numpy integration (which will shine for large datasets). So dunno if it makes sense to test this here since this would be an apples to oranges test.

cnuernber commented 2 years ago

Definitely it changes it to from the original form. The zerocopy integration wouldn't be used here, there would be exactly three memcpy operations to bulk copy from the jvm heap into the native heap and back. But everything else you said is reasonable - I optimized heavily for bulk operations as those have a much higher top end in terms of potential raw performance. I would think even in raw trading you would have some level of batching as that would make the processing costs far less but those may be overall irrelevant.

And if you had batches and batches of data then you could copy data into an existing native dataset and you can see how this is going to dominate any other approach. You don't allocate anything really, just copy data into double arrays and have an optimized copy into python then run the function. I guess the result in python is always allocated but if that is the difference between success and failure you are really on a damn tight schedule.

There could be faster function call mechanisms such as caching the argument tuple and writing to exact locations in memory especially if the functions have a fixed number of arguments and as I said the clojure.RT call is super suspicious but what you have is the best that libpython-clj can do given its current design - I would have to do real work to get it faster.

subes commented 2 years ago

K, I think this is unrelated to creating a java binding. So I think we can close this issue. If there are future versions to test I am happy to benchmark again. Thanks a lot for the great work. :)

And yes, in normal use cases one would always have more data to transport due to the lookback (but that could also be appended [ideally via zero copy into shared memory] by one datapoint for each tick to reduce redundant copying), or have more significant calculations on the python side, even in an event based system. The benchmark just maximizes the measure of the communication overhead. Which is what matters for low latency trading. But for those cases I normally reimplement functionality in java instead of relying on integration with other languages.

cnuernber commented 2 years ago

Great, that was fun. Great issue!

subes commented 2 years ago

Was also lots of fun to discuss this with you. Not many people are interested in low level stuff like this. :D

Though I opened a follow up issue here: https://github.com/clj-python/libpython-clj/issues/192

cnuernber commented 2 years ago

Perfect!

cnuernber commented 2 years ago

There is a new fastcall pathway for your use case - repeatedly calling a function with only positional arguments. It appears to be about 2x->3x as fast as before or more depending on the which machine it is run on.

In addition when running if you are certain you are locking/unlocking correctly you can define -Dlibpython_clj.manual_gil=true and this will disable the automatic gil management completely - every call after initialize will need to be called with the lock held. This got another I would say 10% - the fastcall optimization is the much bigger one.

Finally I added helpers to make calling clojure objects easier so you don't need to do the casting yourself.

The fastcallable is of course callable with the call method so you can optimize exactly the line of code you need to and leave the rest unchanged.

subes commented 2 years ago

Thanks a lot. I am trying to incorporate this as a transparent cache for a faster alternative to runString: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/UncheckedPythonEngineWrapper.java

Though I am getting exceptions when I want to register my fake-functions using makeFastcallable:

java.lang.RuntimeException: java.util.concurrent.CompletionException: java.lang.Exception: Item def fastCallable():
    restoreContext()
    return 1 is not convertible to a C pointer
    at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.eval(LibpythoncljScriptTaskEnginePython.java:30)
    at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.close(LibpythoncljScriptTaskEnginePython.java:47)
    at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.init(UncheckedPythonEngineWrapper.java:47)
    at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.maybeInit(InitializingPythonEngineWrapper.java:24)
    at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.getInstance(InitializingPythonEngineWrapper.java:54)
    at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:37)
    at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12)
    at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45)
    at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23)
    at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPythonTest.test(LibpythoncljScriptTaskRunnerPythonTest.java:19)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:78)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:567)
    at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:725)
    at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
    at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:149)
    at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:140)
    at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:84)
    at org.junit.jupiter.engine.execution.ExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(ExecutableInvoker.java:115)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.lambda$invoke$0(ExecutableInvoker.java:105)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
    at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:104)
    at org.junit.jupiter.engine.execution.ExecutableInvoker.invoke(ExecutableInvoker.java:98)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:214)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:210)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:135)
    at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:66)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
    at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.invokeAll(SameThreadHierarchicalTestExecutorService.java:41)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:155)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
    at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
    at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
    at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
    at org.junit.platform.engine.support.hierarchical.SameThreadHierarchicalTestExecutorService.submit(SameThreadHierarchicalTestExecutorService.java:35)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestExecutor.execute(HierarchicalTestExecutor.java:57)
    at org.junit.platform.engine.support.hierarchical.HierarchicalTestEngine.execute(HierarchicalTestEngine.java:54)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:107)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:88)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.lambda$execute$0(EngineExecutionOrchestrator.java:54)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.withInterceptedStreams(EngineExecutionOrchestrator.java:67)
    at org.junit.platform.launcher.core.EngineExecutionOrchestrator.execute(EngineExecutionOrchestrator.java:52)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:114)
    at org.junit.platform.launcher.core.DefaultLauncher.execute(DefaultLauncher.java:95)
    at org.junit.platform.launcher.core.DefaultLauncherSession$DelegatingLauncher.execute(DefaultLauncherSession.java:91)
    at org.junit.platform.launcher.core.SessionPerRequestLauncher.execute(SessionPerRequestLauncher.java:60)
    at org.eclipse.jdt.internal.junit5.runner.JUnit5TestReference.run(JUnit5TestReference.java:98)
    at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:40)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:529)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:756)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:452)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:210)
Caused by: java.util.concurrent.CompletionException: java.lang.Exception: Item def fastCallable():
    restoreContext()
    return 1 is not convertible to a C pointer
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:148)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2413)
    at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2411)
    at com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2394)
    at com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.get(LocalLoadingCache.java:54)
    at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.exec(UncheckedPythonEngineWrapper.java:64)
    at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.eval(LibpythoncljScriptTaskEnginePython.java:28)
    ... 78 more
Caused by: java.lang.Exception: Item def fastCallable():
    restoreContext()
    return 1 is not convertible to a C pointer
    at tech.v3.datatype.ffi.ptr_value$unchecked_ptr_value.invokeStatic(ptr_value.clj:17)
    at tech.v3.datatype.ffi.ptr_value$unchecked_ptr_value.invokePrim(ptr_value.clj)
    at tech.v3.datatype.ffi.ptr_value$ptr_value.invokeStatic(ptr_value.clj:27)
    at tech.v3.datatype.ffi.ptr_value$ptr_value.invokePrim(ptr_value.clj)
    at tech.v3.datatype.ffi.jna$ptr_value.invokeStatic(jna.clj:65)
    at tech.v3.datatype.ffi.jna.G__15248.PyCallable_Check(Unknown Source)
    at tech.v3.datatype.ffi.jna.G__15248$invoker_PyCallable_Check.invoke(Unknown Source)
    at libpython_clj2.python.ffi$PyCallable_Check.invokeStatic(ffi.clj:458)
    at libpython_clj2.python.ffi$PyCallable_Check.invoke(ffi.clj:458)
    at libpython_clj2.python.fn$make_fastcallable.invokeStatic(fn.clj:396)
    at libpython_clj2.python.fn$make_fastcallable.invoke(fn.clj:387)
    at clojure.lang.Var.invoke(Var.java:384)
    at libpython_clj2.java_api$_makeFastcallable.invokeStatic(java_api.clj:301)
    at libpython_clj2.java_api$_makeFastcallable.invoke(java_api.clj:287)
    at libpython_clj2.java_api.makeFastcallable(Unknown Source)
    at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.fastCallableCache_load(UncheckedPythonEngineWrapper.java:100)
    at com.github.benmanes.caffeine.cache.LocalLoadingCache.lambda$newMappingFunction$2(LocalLoadingCache.java:141)
    ... 86 more

Also I think we are missing overrides for call and fastCall without any parameters (arity 0).

subes commented 2 years ago

Here the complete fake-function:

def fastCallable():
    someMethod() //or some code lines
    return 1
cnuernber commented 2 years ago

Ah, this takes an existing python callable, not a string - here is the unit test - https://github.com/clj-python/libpython-clj/blob/4cb7ba432e10a6d30f592611a2ca8eb537fc45e2/test/libpython_clj2/java_api_test.clj#L31

subes commented 2 years ago

That works. Then I get an error about passing a null argument because there is no 0 arity overload available:

Caused by - java.lang.Exception: Pointer value is zero!
        at tech.v3.datatype.ffi.ptr_value$ptr_value.invokeStatic(ptr_value.clj:29)
        at tech.v3.datatype.ffi.ptr_value$ptr_value.invokePrim(ptr_value.clj)
        at tech.v3.datatype.ffi.jna$ptr_value.invokeStatic(jna.clj:65)
        at tech.v3.datatype.ffi.jna.G__15248.PyObject_CallObject(Unknown Source)
        at tech.v3.datatype.ffi.jna.G__15248$invoker_PyObject_CallObject.invoke(Unknown Source)
        at libpython_clj2.python.ffi$PyObject_CallObject.invokeStatic(ffi.clj:458)
        at libpython_clj2.python.ffi$PyObject_CallObject.invoke(ffi.clj:458)
        at libpython_clj2.python.fn$fastcall.invokeStatic(fn.clj:337)
        at libpython_clj2.python.fn$fastcall.invoke(fn.clj:337)
        at clojure.lang.Var.invoke(Var.java:384)
        at libpython_clj2.java_api$_fastcall.invokeStatic(java_api.clj:276)
        at libpython_clj2.java_api$_fastcall.invoke(java_api.clj:264)
        at libpython_clj2.java_api.fastcall(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.fastEval(UncheckedPythonEngineWrapper.java:68) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.exec(UncheckedPythonEngineWrapper.java:60) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.eval(LibpythoncljScriptTaskEnginePython.java:28) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.close(LibpythoncljScriptTaskEnginePython.java:47) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.init(UncheckedPythonEngineWrapper.java:49) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.maybeInit(InitializingPythonEngineWrapper.java:24) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.getInstance(InitializingPythonEngineWrapper.java:54) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:37) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
      * at de.invesdwin.util.concurrent.internal.WrappedThreadFactory.lambda$0(WrappedThreadFactory.java:44) *
        ... 2 more, see error.log

Using a fake argument gives this error:

Caused by - java.lang.Exception: Item 1 is not convertible to a C pointer
        at tech.v3.datatype.ffi.ptr_value$unchecked_ptr_value.invokeStatic(ptr_value.clj:17)
        at tech.v3.datatype.ffi.ptr_value$unchecked_ptr_value.invokePrim(ptr_value.clj)
        at tech.v3.datatype.ffi.ptr_value$ptr_value.invokeStatic(ptr_value.clj:27)
        at tech.v3.datatype.ffi.ptr_value$ptr_value.invokePrim(ptr_value.clj)
        at tech.v3.datatype.ffi.jna$ptr_value.invokeStatic(jna.clj:65)
        at tech.v3.datatype.ffi.jna.G__15248.PyObject_CallObject(Unknown Source)
        at tech.v3.datatype.ffi.jna.G__15248$invoker_PyObject_CallObject.invoke(Unknown Source)
        at libpython_clj2.python.ffi$PyObject_CallObject.invokeStatic(ffi.clj:458)
        at libpython_clj2.python.ffi$PyObject_CallObject.invoke(ffi.clj:458)
        at libpython_clj2.python.fn$fastcall.invokeStatic(fn.clj:337)
        at libpython_clj2.python.fn$fastcall.invoke(fn.clj:337)
        at clojure.lang.Var.invoke(Var.java:384)
        at libpython_clj2.java_api$_fastcall.invokeStatic(java_api.clj:276)
        at libpython_clj2.java_api$_fastcall.invoke(java_api.clj:264)
        at libpython_clj2.java_api.fastcall(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.fastEval(UncheckedPythonEngineWrapper.java:68) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.exec(UncheckedPythonEngineWrapper.java:60) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.eval(LibpythoncljScriptTaskEnginePython.java:28) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.close(LibpythoncljScriptTaskEnginePython.java:47) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.init(UncheckedPythonEngineWrapper.java:49) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.maybeInit(InitializingPythonEngineWrapper.java:24) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.getInstance(InitializingPythonEngineWrapper.java:54) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:37) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
      * at de.invesdwin.util.concurrent.internal.WrappedThreadFactory.lambda$0(WrappedThreadFactory.java:44) *
        ... 2 more, see error.log

See current version here: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/UncheckedPythonEngineWrapper.java I also tried passing a predefined python object from globals.get("...") to pass into fastCall. But this gives the same error.

cnuernber commented 2 years ago

Ok. Will test that pathway and get back to you.

subes commented 2 years ago

It seems makeFastCallable with call (using 0 arity overload) works (instead of fastcall (1 arity)).

cnuernber commented 2 years ago

I think the issue is that fastcall always takes the context as the first argument. Then the item. so java_api.fastcall(nil, item); would work for a zero-arity overload but honestly the makeFastcallable system is just more robust.

Travis gets these numbers for the calling conventions given your original 2 argument pathway:

Python calls/ms 803.7326204495045
Python fastcall calls/ms 2587.2968711932776
Python fastcallable calls/ms 2525.994580812058

On my laptop running the same tests (without automatic GIL management disabled):

Testing libpython-clj2.stress-test
Python calls/ms 1871.1326226413623
Python fastcall calls/ms 3631.9062044823413
Python fastcallable calls/ms 3569.0528059350663

With GIL management disabled, once the system warms up:

libpython-clj2.stress-test> py-ffi/manual-gil
true
libpython-clj2.stress-test> (fastcall)
Python calls/ms 1758.6479896937037
Python fastcall calls/ms 3756.2619045799947
Python fastcallable calls/ms 3908.9197616587776
nil
libpython-clj2.stress-test> (fastcall)
Python calls/ms 2110.4355154454997
Python fastcall calls/ms 3967.9204739855595
Python fastcallable calls/ms 4121.459952342404
nil
libpython-clj2.stress-test> (fastcall)
Python calls/ms 2134.3139020230656
Python fastcall calls/ms 4143.687674016754
Python fastcallable calls/ms 4148.154610665761
nil
subes commented 2 years ago

The cache is not needed. One only has to define a fastcallable function that runs exec which receives 1 parameter:

When I try to extract the builtin exec:

final Map<?, ?> mainModule = libpython_clj2.java_api.runString("");
this.globals = (Map<Object, Object>) mainModule.get("globals");
this.builtins = (Map<Object, Object>) globals.get("__builtins__");
this.execFastCallable = libpython_clj2.java_api.makeFastcallable(builtins.get("exec"));

I get the following error (it seems __builtins__ is not a map):

Caused by - java.lang.ClassCastException: class libpython_clj2.python.bridge_as_jvm$generic_pyobject$reify__14348 cannot be cast to class java.util.Map (libpython_clj2.python.bridge_as_jvm$generic_pyobject$reify__14348 is in unnamed module of loader clojure.lang.DynamicClassLoader @39fda266; java.util.Map is in module java.base of loader 'bootstrap')
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.init(UncheckedPythonEngineWrapper.java:37) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.maybeInit(InitializingPythonEngineWrapper.java:24) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.getInstance(InitializingPythonEngineWrapper.java:54) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskRunnerPython.run(LibpythoncljScriptTaskRunnerPython.java:37) *
      * at de.invesdwin.context.python.runtime.contract.AScriptTaskPython.run(AScriptTaskPython.java:12) *
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:45) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
      * at de.invesdwin.util.concurrent.internal.WrappedThreadFactory.lambda$0(WrappedThreadFactory.java:44) *
        ... 2 more, see error.log

When I instead create a wrapper function around exec: https://github.com/invesdwin/invesdwin-context-python/blob/master/invesdwin-context-python-parent/invesdwin-context-python-runtime-libpythonclj/src/main/java/de/invesdwin/context/python/runtime/libpythonclj/internal/UncheckedPythonEngineWrapper.java

It seems I can not get the code executed to use the global scope. Instead it seems to be always bound to the local scope of the function instead of the outside global scope. This is because the get later on does not find the variable that was defined by the exec:

2022-01-01 18:28:25.830 [ |7-1:InputsAndResult] ERROR de.invesdwin.ERROR.process                                   - processing #00000001
de.invesdwin.context.log.error.LoggedRuntimeException: #00000001 org.opentest4j.AssertionFailedError: 
expected: "Hello World!"
 but was: null
        ... 13 omitted, see following cause or error.log
Caused by - org.opentest4j.AssertionFailedError: 
expected: "Hello World!"
 but was: null
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:78)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
      * at de.invesdwin.context.python.runtime.contract.hello.HelloWorldScript.testHelloWorld(HelloWorldScript.java:46) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests.test(InputsAndResultsTests.java:23) *
      * at de.invesdwin.context.python.runtime.contract.InputsAndResultsTests$1.run(InputsAndResultsTests.java:47) *
      * at de.invesdwin.util.concurrent.internal.WrappedRunnable.run(WrappedRunnable.java:47) *
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
      * at de.invesdwin.util.concurrent.internal.WrappedThreadFactory.lambda$0(WrappedThreadFactory.java:44) *
        ... 2 more, see error.log
subes commented 2 years ago

As per your suggestion. Using libpython_clj2.java_api.fastcall(null, fastCallable, "__script__"); instead of libpython_clj2.java_api.call(fastCallable, "__script__"); gives a nullpointer:

Caused by - java.lang.NullPointerException: Cannot load from object array because "xs" is null
        at clojure.lang.RT.aget(RT.java:2380)
        at libpython_clj2.python.fn$fastcall.invokeStatic(fn.clj:337)
        at libpython_clj2.python.fn$fastcall.invoke(fn.clj:337)
        at clojure.lang.Var.invoke(Var.java:393)
        at libpython_clj2.java_api$_fastcall.invokeStatic(java_api.clj:278)
        at libpython_clj2.java_api$_fastcall.invoke(java_api.clj:264)
        at libpython_clj2.java_api.fastcall(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.fastEval(UncheckedPythonEngineWrapper.java:62) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.exec(UncheckedPythonEngineWrapper.java:54) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.eval(LibpythoncljScriptTaskEnginePython.java:28) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.close(LibpythoncljScriptTaskEnginePython.java:47) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.init(UncheckedPythonEngineWrapper.java:43) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.maybeInit(InitializingPythonEngineWrapper.java:24) *
cnuernber commented 2 years ago

builtins is a module so getAttr will work for that.

I think for your use case you really need that script-api to be better supported. So for instance you are going to do a string of setGlobal commands and then executeScript and then I guess a getGlobal.

This API is designed specifically so you can create a python function, not an setvars, runscript, gtGlobal access pattern.

Let's stick with the makeFastcallable pathway. In your last example that is no longer a 0-arity pathway as it now has 1 argument so it now requires a context that you would need to allocate via allocateContext. fastcall can have a nil context for 0 arity but for any other arity it requires a valid context. I think makeFastCallable is your best bet as long as you are careful to close them when they are no longer useful.

I am not sure that calling python's exec function will be fastest. There is a way to compile a script to bytecode and just execute the bytecode and I do not know if exec will do that. For your use case I think I just have to support fast ways of setting global variables which means caching the python strings and then caching the compiled script. I am not sure we can work around anything else.

I think the fastcall pathway will be fastest if you have a function that takes some small (<6) number of arguments that you want to expose to java and call repeatedly. Calling a script that accesses global variables is a different use case and needs to optimized individually outside the fastcall context.

subes commented 2 years ago

One can extract the exec builtin function from the globals via an alias variable:

        final String fastCallableName = "__fastCallable__";
        final Map<?, ?> mainModule = libpython_clj2.java_api.runString(fastCallableName + " = exec");
        this.globals = (Map<Object, Object>) mainModule.get("globals");
        this.fastCallable = libpython_clj2.java_api.makeFastcallable(globals.get(fastCallableName));

But when calling it I get this weird error:

Caused by - java.lang.Exception: SystemError: frame does not exist

        at libpython_clj2.python.ffi$check_error_throw.invokeStatic(ffi.clj:691)
        at libpython_clj2.python.ffi$check_error_throw.invoke(ffi.clj:689)
        at libpython_clj2.python.ffi$simplify_or_track.invokeStatic(ffi.clj:948)
        at libpython_clj2.python.ffi$simplify_or_track.invoke(ffi.clj:929)
        at libpython_clj2.python.fn$fastcall.invokeStatic(fn.clj:337)
        at libpython_clj2.python.fn$fastcall.invoke(fn.clj:337)
        at libpython_clj2.python.fn$make_fastcallable$reify__13773.invoke(fn.clj:398)
        at libpython_clj2.java_api$_call.invokeStatic(java_api.clj:236)
        at libpython_clj2.java_api$_call.invoke(java_api.clj:229)
        at libpython_clj2.java_api.call(Unknown Source)
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.fastEval(UncheckedPythonEngineWrapper.java:59) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.exec(UncheckedPythonEngineWrapper.java:51) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.eval(LibpythoncljScriptTaskEnginePython.java:28) *
      * at de.invesdwin.context.python.runtime.libpythonclj.LibpythoncljScriptTaskEnginePython.close(LibpythoncljScriptTaskEnginePython.java:47) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.UncheckedPythonEngineWrapper.init(UncheckedPythonEngineWrapper.java:40) *
      * at de.invesdwin.context.python.runtime.libpythonclj.internal.InitializingPythonEngineWrapper.maybeInit(InitializingPythonEngineWrapper.java:24) *