Make JNA Binding available to Java clients?

subes commented 2 years ago

Hi,

I would be interested to integrate julia via JNA here: https://github.com/invesdwin/invesdwin-context-julia/

Currently I tried to use Julia4j which gives memory access errors (maybe you have an Idea about what causes those) after a few commands and lacks error handling: https://github.com/rssdev10/julia4j/issues/2

Could you maybe provide a JNA layer that can be used from java code without clojure? Would be very much appreciated.

cnuernber commented 2 years ago

I would be interested in a java wrapper that you could use without directly calling clojure.

Here is a blogpost about how the ffi system works in case you are interested.

cnuernber commented 2 years ago

I do have an idea about the crashing - https://cnuernber.github.io/libjulia-clj/signals.html.

subes commented 2 years ago

Thanks for the input, I will try the signal chaining workaround. I guess with that one does not require the j_options function to disable signal handling (since Julia requires that for multithreading as I understood from your explanation).

Regarding Java16-FFI. I would prefer something that is compatible up to Java 8. Since there are still lots of companies stuck at Java 8 or Java 11.

So if you prefer JNR over JNA. It does not matter to me what I integrate as long as it gets the job done. JNR might be a bit faster from what I understand for situations where lots calls are made into Julia.

I guess it would be easy for you to port this to java, since you already have lots of experience with this. :) I would be happy to integrate and test it then.

cnuernber commented 2 years ago

It would be difficult to port it directly to java - that is a bad assumption. Clojure is much more compact than java and furthermore the ffi layer it is built upon is a large part of the system. dtype-next is a complex piece of engineering that enables good support for algorithms across jvm-heap and native-heap datastructures. That is the piece that makes libraries like these so quick and it is a foundational piece - exactly the type that is not easily replaceable. If you would like to wrap this library in a pure java layer so users don't need to use Clojure to use it I would be interested in helping.

subes commented 2 years ago

Well, if it is possible to use libjulia-clj from java, then I am fine with that. What do I need to do to initialize/call it? I know from kotlin libraries that it is possible to use it as a normal jar from java. Examples being mapdb and okhttp.

cnuernber commented 2 years ago

Definitely possible. Glad you are open minded - will respond in detail soon.

subes commented 2 years ago

Regarding the signal chaining workaround. That did the trick. Though Julia still causes JVM crashes when calling it from other threads. Thus I created a workaround to always call julia from the same executor thread. Currently implemented with julia4j.

To call libjulia-clj from java I guess we need :gen-class directives in the clojure files: https://stackoverflow.com/questions/2181774/calling-clojure-from-java With the current version of libjulia-clj I can not access anything from Java (imports can not find any classes).

cnuernber commented 2 years ago

I am glad the signal chaining worked for you - that was some in-depth information took some work to figure out.

The best way to call clojure from java is to use the extremely minimal public api. This is so minimal that calling things like the initialize function would require some work such as constructing keywords and most likely persistent maps or something along those lines.

The gen-class pathway is possible but the above API would work without changes to the published jar nor requiring any AOT - it would, however, require more boilerplate initially. Given that julia4j is working for you are in interested in pursuing this further or should we close this issue?

subes commented 2 years ago

I would still like to have multiple options integrated into invesdwin-context-julia. JNA has the benefit of not requiring a native compiled dll/so (which I currently only have compiled for linux using julia4j). Also error handling and getting string responses from julia is lacking with julia4j at the moment.

cnuernber commented 2 years ago

OK that is encouraging. Also note that libjulia-clj has zero copy support for dense nd objects so you can actually share memory between java and julia although you have to keep in mind that julia is column major while the underlying ND object system I use is row-major.

Do you have a minimal java project where you are trying these things out?

subes commented 2 years ago

Here is the project that I prepared with a dependency to libjulia-clj: https://github.com/invesdwin/invesdwin-context-julia/tree/main/invesdwin-context-julia-parent/invesdwin-context-julia-runtime-libjuliaclj

Zero copy support would be great. Regarding the minimal API provided by Clojure, I guess gen-class is preferable as it is generated and can not break by getting out of sync with the libjulia-clj implementation.

cnuernber commented 2 years ago

for gen-class to work we need a precompile step. Clojure libraries don't package the actual class files as they dynamically compile the .clj files upon require.

My recommendation is for me to create a small gen-class-based class that I will test with the unit testing system and for the build system of your the invesdwin libjuliaclj bindings run a simple compilation step that will create all of the required class files to any desired directory and then if you package those class files with your jar the java import step will work.

cnuernber commented 2 years ago

Or perhaps I can upload a version of libjulia with the gen-class and class files in it. I think this may be the best step for now. I will reach out when I have something that I think will work.

subes commented 2 years ago

Sounds good.

cnuernber commented 2 years ago

First attempt - jar - https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-3

API docs - https://cnuernber.github.io/libjulia-clj/libjulia-clj.java-api.html

So, there should be a class in package libjulia_clj named java_api:

  -rw-rw-rw-      1992  26-Dec-2021  10:52:10  libjulia_clj/java_api.class

The functions should be without the - prefix, that just indicates to clojure not to mangle the names in any way.

It is easiest to export the env var JULIA_HOME just as in the local env script. You can also pass "julia-home" in as the key in the options map to initialize.

Small example unit test is here.

subes commented 2 years ago

I don't understand. Is it supposed to work by calling libjulia_clj.java_api.main("(japi/-initialize (jvm-map/hash-map {"n-threads" 8}))")? I dont see other public methods or a way to get back return values.

subes commented 2 years ago

Trying it:

    public static void main(final String[] args) {
        libjulia_clj.java_api.main(new String[] { "(japi/-initialize (jvm-map/hash-map {\"n-threads\" 8}))" });
    }

Results in:

Exception in thread "main" java.lang.UnsupportedOperationException: libjulia-clj.java-api/-main not defined
    at libjulia_clj.java_api.main(Unknown Source)
    at de.invesdwin.context.julia.runtime.libjuliaclj.internal.UnsafeJuliaEngineWrapper.main(UnsafeJuliaEngineWrapper.java:49)

This is what the generated class looks like:

// Warning: No line numbers available in class file
/*  */ 
/*  */ import clojure.lang.IFn;
/*  */ import clojure.lang.RT;
/*  */ import clojure.lang.Util;
/*  */ import clojure.lang.Var;
/*  */ 
/*  */ public class java_api {
/*  */   private static final Var main__var = Var.internPrivate("libjulia-clj.java-api", "-main");
/*  */   
/*  */   private static final Var equals__var = Var.internPrivate("libjulia-clj.java-api", "-equals");
/*  */   
/*  */   private static final Var toString__var = Var.internPrivate("libjulia-clj.java-api", "-toString");
/*  */   
/*  */   private static final Var hashCode__var = Var.internPrivate("libjulia-clj.java-api", "-hashCode");
/*  */   
/*  */   private static final Var clone__var = Var.internPrivate("libjulia-clj.java-api", "-clone");
/*  */   
/*  */   static {
/*  */     Util.loadWithClass("/libjulia_clj/java_api", java_api.class);
/*  */   }
/*  */   
/*  */   public boolean equals(Object paramObject) {
/*  */     equals__var.isBound() ? (IFn)equals__var.get() : null;
/*  */     return ((equals__var.isBound() ? (IFn)equals__var.get() : null) != null) ? ((Boolean)((IFn)(equals__var.isBound() ? (IFn)equals__var.get() : null)).invoke(this, paramObject)).booleanValue() : super.equals(paramObject);
/*  */   }
/*  */   
/*  */   public String toString() {
/*  */     toString__var.isBound() ? (IFn)toString__var.get() : null;
/*  */     return ((toString__var.isBound() ? (IFn)toString__var.get() : null) != null) ? (String)((IFn)(toString__var.isBound() ? (IFn)toString__var.get() : null)).invoke(this) : super.toString();
/*  */   }
/*  */   
/*  */   public int hashCode() {
/*  */     hashCode__var.isBound() ? (IFn)hashCode__var.get() : null;
/*  */     return ((hashCode__var.isBound() ? (IFn)hashCode__var.get() : null) != null) ? ((Number)((IFn)(hashCode__var.isBound() ? (IFn)hashCode__var.get() : null)).invoke(this)).intValue() : super.hashCode();
/*  */   }
/*  */   
/*  */   public Object clone() {
/*  */     clone__var.isBound() ? (IFn)clone__var.get() : null;
/*  */     return ((clone__var.isBound() ? (IFn)clone__var.get() : null) != null) ? ((IFn)(clone__var.isBound() ? (IFn)clone__var.get() : null)).invoke(this) : super.clone();
/*  */   }
/*  */   
/*  */   public static void main(String[] paramArrayOfString) {
/*  */     if ((main__var.isBound() ? (IFn)main__var.get() : null) != null) {
/*  */       ((IFn)(main__var.isBound() ? (IFn)main__var.get() : null)).applyTo(RT.seq(paramArrayOfString));
/*  */     } else {
/*  */       throw new UnsupportedOperationException("libjulia-clj.java-api/-main not defined");
/*  */     } 
/*  */   }
/*  */ }

subes commented 2 years ago

This seems to work:

    public static void main(final String[] args) {
        final HashMap<String, Object> initParams = new HashMap<String, Object>() {
            {
                put("n-threads", 8);
            }
        };
        final Object call = libjulia_clj.java_api__init.const__3.invoke(initParams);
        System.out.println(call);
    }

Using this class:

/*    */  
/*    */    
/*    */    public static final Var const__0;
/*    */    public static final AFn const__1;
/*    */    public static final AFn const__2;
/*    */    public static final Var const__3;
/*    */    public static final AFn const__12;
/*    */    public static final Var const__13;
/*    */    public static final AFn const__16;
/*    */    public static final Var const__17;
/*    */    public static final AFn const__20;
/*    */    public static final Var const__21;
/*    */    public static final AFn const__24;
/*    */    public static final Var const__25;
/*    */    public static final AFn const__28;
/*    */    public static final Var const__29;
/*    */    public static final AFn const__32;
/*    */    
/*    */    public static void __init0() {
/*    */      const__0 = RT.var("clojure.core", "in-ns");
/*    */      const__1 = (AFn)Symbol.intern(null, "libjulia-clj.java-api");
/*    */      const__2 = (AFn)Symbol.intern(null, "clojure.core");
/*    */      const__3 = RT.var("libjulia-clj.java-api", "-initialize");
/*    */      const__12 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(Symbol.intern(null, "options")) })), RT.keyword(null, "doc"), "Initialize the julia interpreter.  See documentation for [[libjulia-clj.julia/initialize!]].\n  Options may be null or must be a map of string->value for one of the supported initialization\n  values.\n\n  Example:\n\n```clojure\n  (japi/-initialize (jvm-map/hash-map {\"n-threads\" 8}))\n```", RT.keyword(null, "line"), Integer.valueOf(13), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__13 = RT.var("libjulia-clj.java-api", "-runString");
/*    */      const__16 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(((IObj)Symbol.intern(null, "data")).withMeta(RT.map(new Object[] { RT.keyword(null, "tag"), Symbol.intern(null, "String") }))) })), RT.keyword(null, "doc"), "Run a string in Julia returning a jvm object if the return value is simple or\n  a julia object if not.  The returned object will have a property overloaded\n  toString method for introspection.", RT.keyword(null, "line"), Integer.valueOf(31), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__17 = RT.var("libjulia-clj.java-api", "-inJlContext");
/*    */      const__20 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(((IObj)Symbol.intern(null, "fn")).withMeta(RT.map(new Object[] { RT.keyword(null, "tag"), Symbol.intern(null, "Function") }))) })), RT.keyword(null, "doc"), "Execute a function in a context where all julia objects created will be released\n  just after the function returns.  The function must return pure JVM data - it cannot\n  return a reference to a julia object.", RT.keyword(null, "line"), Integer.valueOf(39), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__21 = RT.var("libjulia-clj.java-api", "-namedTuple");
/*    */      const__24 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(((IObj)Symbol.intern(null, "data")).withMeta(RT.map(new Object[] { RT.keyword(null, "tag"), Symbol.intern(null, "Map") }))) })), RT.keyword(null, "doc"), "Create a julia named tuple.  This is required for calling keyword functions.  The\n  path for calling keyword functions looks something like:\n\n  * `data` - must be an implementation of java.util.Map with strings as keys.\n\n```clojure\n(let [add-fn (jl \"function teste(a;c = 1.0, b = 2.0)\n    a+b+c\nend\")\n          kwfunc (jl \"Core.kwfunc\")\n          add-kwf (kwfunc add-fn)]\n      (is (= 38.0 (add-kwf (jl/named-tuple {'b 10 'c 20})\n                           add-fn\n                           8.0)))\n      (is (= 19.0 (add-kwf (jl/named-tuple {'b 10})\n                           add-fn\n                           8.0)))\n      (is (= 11.0 (add-kwf (jl/named-tuple)\n                           add-fn\n                           8.0)))\n\n      (is (= 38.0 (add-fn 8.0 :b 10 :c 20)))\n      (is (= 19.0 (add-fn 8 :b 10)))\n      (is (= 11.0 (add-fn 8))))\n```", RT.keyword(null, "line"), Integer.valueOf(49), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__25 = RT.var("libjulia-clj.java-api", "-createArray");
/*    */      const__28 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(Symbol.intern(null, "datatype"), Symbol.intern(null, "shape"), Symbol.intern(null, "data")) })), RT.keyword(null, "doc"), "Return julia array out of the tuple of datatype, shape, and data.\n\n  * `datatype` - must be one of the strings `[\"int8\" \"uint8\" \"int16\" \"uin16\"\n  \"int32\" \"uint32\" \"int64\" \"uint64\" \"float32\" \"float64\"].\n  * `shape` - an array or implementation of java.util.List that specifies the row-major\n  shape intended of the data.  Note that Julia is column-major so this data will appear\n  transposed when printed via Julia.\n  * `data` may be a java array or an implementation of java.util.List.  Ideally data is\n  of the same datatype as data.", RT.keyword(null, "line"), Integer.valueOf(79), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */      const__29 = RT.var("libjulia-clj.java-api", "-arrayToJVM");
/*    */      const__32 = (AFn)RT.map(new Object[] { RT.keyword(null, "arglists"), PersistentList.create(Arrays.asList(new Object[] { Tuple.create(Symbol.intern(null, "jlary")) })), RT.keyword(null, "doc"), "Returns a map with three keys - shape, datatype, and data.  Shape is an integer array,\n  datatype is a string denoting one of the supported datatypes, and data is a primitive\n  array of data.", RT.keyword(null, "line"), Integer.valueOf(96), RT.keyword(null, "column"), Integer.valueOf(1), RT.keyword(null, "file"), "libjulia_clj/java_api.clj" });
/*    */    }
/*    */    
/*    */    static {
/*    */      __init0();
/*    */      Compiler.pushNSandLoader(RT.classForName("libjulia_clj.java_api__init").getClassLoader());
/*    */      try {
/*    */        load();
/*    */      } finally {
/*    */        Var.popThreadBindings();
/*    */      } 
/*    */    } }

Though is there maybe a way to properly name those handles?

subes commented 2 years ago

Also would it be possible to have a fallback available so that I can set JULIA_HOME based on a system property instead of a ENV variable? Then I could configure it programmatically before calling libjulia-clj.

subes commented 2 years ago

I got my tests green with this: https://github.com/invesdwin/invesdwin-context-julia/blob/main/invesdwin-context-julia-parent/invesdwin-context-julia-runtime-libjuliaclj/src/main/java/de/invesdwin/context/julia/runtime/libjuliaclj/internal/UnsafeJuliaEngineWrapper.java grafik I also needed the workaround with using always the same thread. initparams.put("signals-enabled?" false) did not work as an alternative to the LD_PRELOAD workaround. Maybe the ? is too much?

Also the error handling looks good: grafik

So good job so far!

With the JULIA_HOME system property or startup parameter (to configure it programmatically) I would be very happy. Naming the const__x handles a bit better would be a bonus.

cnuernber commented 2 years ago

Ah, I must have done something wrong w/r/t to exporting the various functions. The functions should be something like public static Object initialize(Object options) ...

subes commented 2 years ago

Also this is by far the fastest integration so far:

Libjulia-clj: grafik

Julia4j: grafik

JuliaCaller: grafik

Jajub: grafik

Though I wonder why is it so much faster?

With JuliaCaller and Jajub I think it might be the REPL+Pipe/Socket overhead
With Julia4j it might be the workaround of getting strings via writing/reading files

subes commented 2 years ago

Also I noticed you developed: https://github.com/clj-python/libpython-clj Maybe we could also export java classes for that so I can integrate it into: https://github.com/invesdwin/invesdwin-context-python

cnuernber commented 2 years ago

Sure, libpython-clj could have nearly identical bindings but let's get this process down first. There is lot's more where that is concerned...

avclj is bindings to the ffmpeg shared libraries so you can encode/decode video.
We also have a data processing library that is damn fast. I explain why it is so fast here - for an independent developer to smash the bigger toolkits is no small feat.
The larger Clojure scicloj community has an ml package that can use the sklearn learners and comes with good smile bindings by default :-).

When I generate bindings for JNA every binding I generate is direct mapped. This gets similar speed to JNR. For granular function access JDK-17 is about twice as fast but it comes with some serious caveats in how it loads shared libraries. For graal native we can directly link the library into the final executable so that is another perf boost but it doesn't support generic callbacks so that is a serious weakness - hopefully the graal system supports the JDK-17 foreign api at some point.

I will rebuild the jar with correct symbols so the interfacing code isn't quite so harsh.

cnuernber commented 2 years ago

New jar is up: https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-4.

This should just have normal public static methods.

subes commented 2 years ago

scicloj looks really cool!

Would be happy to integrate that here (closed source though): grafik

Here the open source performance related stuff that I am working on:

NoSQL Database for financial data: https://github.com/invesdwin/invesdwin-context-persistence#timeseries-module
Channel-API for fast IPC: https://github.com/invesdwin/invesdwin-context-integration#synchronous-channels
A very fast expression language: https://github.com/invesdwin/invesdwin-util/blob/master/invesdwin-util-parent/invesdwin-util/doc/LanguageDefinition.pdf
Here a presentation how I use that to generate strategies for my doctorate: https://www.youtube.com/watch?v=Ilw8J_bfgwA&list=FLebnPcJPaUWYjEuJj6z7tSw

cnuernber commented 2 years ago

That is a very impressive set of modules. tmd is much faster and more sophisticated than tablesaw and it supports things like memory mapped arrow files - something the arrow java SDK itself doesn't support. I had never heard of jquantlib before but I have users who would be interested in that.

The NoSQL database is interesting - I would argue that the machinery around DAO's isn't worth it - keeping things as columns will lead to faster processing times in general. In some specifics I could see things being different but my experience, and as I indicate in my talk, is that columns allow hotspot to emit vectorizing instructions while processing the data in row-major DAO form with objects or otherwise does not. I think this is probably an age-old argument that has many tradeoffs and caveats.

Weka is GPLv2 so we stay away from it. Note that isn't LGPL - it is the full GPL.

What is the primary user interface to this system?

cnuernber commented 2 years ago

Another library then that you may like in the fast-data-pathways is tmducken which binds to duckdb at the C level. That one is just barely fleshed out but it works and is quite quick.

subes commented 2 years ago

Here some documentation about my usage of GPL'd code: https://github.com/invesdwin/invesdwin-context-r#license-discussion tl;dr: only use it for testing, it's a deployment/redistribution concern (personal usage is unaffected). If you want the gray-area-solution: wrap it in a CLI application (then it runs similar to gnu tools).

The NoSQL storage is only for keeping data compressed locally to a computation node. For processing there are way better formats (as you also suggest). I am using precomputed primitive arrays for my strategy generation stuff. Though I also have other backtesting engines that use multiple layers of selfoptimizing lookback caches on top of the database. For cases where data does not fit into memory or large portfolios are tested.

subes commented 2 years ago

Also, from what I understand, nowadays SMILE is also GPL, not LGPL anymore.

subes commented 2 years ago

Regarding DuckDB: anything that requires SQL parsing is too slow for my requirements (from what I have tested so far). Though I would be happy to create a benchmark if there is a java binding. Another no-go is the lack of compression when handling tick data.

subes commented 2 years ago

The named methods look good now. Regarding JULIA_HOME I would suggest this: grafik

cnuernber commented 2 years ago

That all makes a lot of sense. I do have a duckdb pathway - there is a way to test but I need to go for now. You can pass in "julia-home" as a key in the map and it will supercede env var. Agree name you suggest makes more sense.

subes commented 2 years ago

This works: grafik

I don't care about the name. julia-home follows the naming pattern of the other variables. So it is fine like this.

subes commented 2 years ago

Regarding primary user interface:

primarily coding in an IDE
though I have web interfaces, desktop interfaces and lots of reports
it is more a set of modules for building products with their respective frontends
- the concept is called a software product line: https://github.com/invesdwin/invesdwin-context/blob/master/invesdwin-context-parent/invesdwin-context/doc/concept/invesdwin-concept.pdf
- tl;dr: something like spring-boot just with reusable domain specific modules (not only technical modules). Better isolation of modules and composable integration tests. Deployment scenarios are a concern separated from the modules. The customer only gets what he pays for. Or like car platforms in the automobile industry with heavy customization capabilities.
- It is a white label development platform for (mainly trading related) financial products.
see slides 2.1 to 2.7 from this presentation for some screenshots: https://github.com/invesdwin/invesdwin-util/blob/master/invesdwin-util-parent/invesdwin-util/doc/DBA_Presentation_long.pdf

cnuernber commented 2 years ago

Sweet, thanks, this is all fascinating.

subes commented 2 years ago

Just watching some of your talks. Amazing work you are doing!

Not to be too greedy, but If it is possible to create a java binding for scicloj then I would also love to have a binding for: https://github.com/alanmarazzi/panthera

;)

subes commented 2 years ago

I have benchmarked DuckDB, Hsqldb, H2 and SQLite via their JDBC drivers. Results have been added to the performance table below: https://github.com/invesdwin/invesdwin-context-persistence/blob/master/README.md#timeseries-module

DuckDB only outperforms the other embedded SQL-Databases in iterator speed (using ORDER BY ASC; unordered is ~3x as fast). The other metrics are worse. I would not expect a native binding to make it much more suitable for timeseries queries (get/getlatest). But I can still benchmark that if you provide a java binding, it could perform as well as QuestDB in iterator speed. Though timeseries data has different requirements for storage than columnar data which is more commonly used with machine learning. For timeseries data the write speed is as important for data pipelines in live trading or when loading/calculating from tick streams.

Also regarding your comment above about DAOs, those are only part of the integration modules for JPA. The NoSQL database has nothing to do with DAOs or SQL.

subes commented 2 years ago

I guess we can close this issue. I created a follow up issue here: https://github.com/clj-python/libpython-clj/issues/191

cnuernber commented 2 years ago

The timeseries implementation is really interesting. If I read the explanation correctly you store compressed batches of data in levelDB which means you get great read/write performance but you need a small caching layer in between the user and the data so the user doesn't see the compression system.

This is somewhat similar to how parquet works in that parquet compresses each column separately as you write it and it writes out chunks of each column on demand so you get this interleaved mix of column chunks on disk that get decompressed as you iterate through the data.

Your method seems clearer and simpler but I would bet that the per-column compressed of parquet gets better overall compression but it makes any sort of random access extremely difficult - I decompress the entire record set which is usually a few hundred MB's or so when I read a parquet file. I don't think overall there is a faster method than what you came up with especially with the in-memory cache in front of it although there most likely could be tweaks here or there for particular column types. Do you store each chunk in row-major or column-major format? And when decompress it do you decompress into records or columnwise into primitive arrays?

cnuernber commented 2 years ago

I incorrectly defined the java-api interface for inJlContext. New AOT version is up - https://clojars.org/com.cnuernber/libjulia-clj/versions/1.000-aot-beta-5.

subes commented 2 years ago

Thanks, I upgraded to the new version.

Regarding the architecture of the database: LevelDB is only used as in index for the segment lookup. The segments are stored separately in a file. Writes are done with FileChannel, reads are done using MemoryMapped File (makes heavy use of OS file cache). Storing large payloads does not work with LevelDB (too much write amplification). Also the index is boosted via an Write-Through-In-Memory-Cache. The File is append only (only the last segment can be rewritten). Each segment is compressed using LZ4 High or Fast depending on configuration.

Each Segments contain 10k objects. Serialization/Deserialization is done using an implementation of ISerde using an IByteBuffer abstraction (https://github.com/invesdwin/invesdwin-util#byte-buffers).

The operations that are needed to be fast with time series data are:

getLatest: uses the index to find the proper segment. Then searches that segment for the correct value. From there normally one starts to iterate
Iterate: can be done forward and reverse. Forward is easier. The caches always fetch a bunch of elements to keep a dynamic self optimizing lookback: https://github.com/invesdwin/invesdwin-util#caches
For iterating backwards the FileBufferCache (an application level segment cache replicating the OS file cache, though on uncompressed and unmarshalled data) boosts performance a lot. Otherwise backwards iteration and cache misses in the lookback caches would be very expensive. It also boosts forward iteration between threads a lot. E.g. each thread runs a separate backtest, IO only occurs for the first thread that requires the data.

Column vs Row access is a consideration to make better use of cpu prefetching. Though the current solution already handles that on multiple layers and using zero copy at multiple stages. Though when the data is in the application level FileBufferCache I currently store the data as heap objects (thus columnar data). The alternative would be to use the flyweight pattern in a special ISerde implementation that projects the data from the underlying decompressed byte[] array.

There is a task for that: https://github.com/invesdwin/invesdwin-context-persistence/issues/12
This could theoretically squeeze out an additional 30% performance.
Though this would have the drawback that the lifecycle of the uncompressed objects are bound to the lifecycle of the byte[] in the FileBufferCache. Evicting the FileBufferCache will then not free the byte[] as long as any one object is still referenced. Thus memory consumption will grow very large. Because sometimes user code is only interested in having one object of the segment (e.g. keeping an important value, like the first value of a backtest). There should be ways to balance this, but I have not yet started the refactoring to tackle this (since the 30% don't seem to be worth the effort yet).

The uncompressed objects are normally Ticks (time, ask, bid, askVolume, bidVolume) or Bars (time, open, high, low, close). For my machine learning engine I actually transform those objects into primitive arrays for each value separately. Thus using Columnar instead of Row based access as you suggest to get the in-memory-speed that I need for hundreds of thousands to millions of backtests per second (on the CPU to be clear). I extract more columnar primitive arrays for results of indicators and reuse calculations. Boolean expressions are stored even better via bitsets which allow combining &&/|| conditions very fast without having to do complex calculations of indicators multiple times. The engine uses as much memory as possible to test strategies as fast as possible. Also any pointer dereferencing is poison at those speeds.

Though for general live trading or complex portfolio scenarios, memory is more a concern. In that case the Row based access is kept and each data point has its own lifecycle. Pushing/Pulling new market data is easier that way and plotting old data can be queried dynamically in a thread safe (but slower) engine. I also have an engine that can use the columnar storage for complex portfolio backtests. Though it is limited by memory easily. I am working on a new circular buffer engine that uses columnar storage but is live capable (moving windows in primitive arrays) and allows thread safe access (required for plotting and semi automated trading). So in the end I also follow the principle there that more options allow better flexibility, so use the execution engine that best suits the task. Similar to being able to choose the language integration framework that best suits the task. Though doing this for backtesting and execution engines requires lots of testing to make sure that all engines produce the same outputs. I have thousands of test cases with gigabytes of reference files to ensure that. Also what I am writing here is only the tip of the iceberg of optimizations. Making backtests faster opens so many possibilities. The faster our tools become, the harder the problems we can solve.

With the primitive representation of the indicators, it would be possible to just load them into a GPU (the CPU handles all the complex calculations and data loading). Then let the GPU do what it does best: combine strategies in thousands of threads and find combinations that perform well based on some simple fitness criterion. The GPU then gives the best candidates as the output after some evolutionary (or some other ML approach) process (the Cuda/OpenCL code required here is not so much anymore and can be highly optimized). The CPU can then filter them, do risk management and robustness analysis to combine them into fully automated portfolios.

But let's leave it at that. I guess I drifted a bit too deep, sorry for that. :D

cnuernber commented 2 years ago

That is great, I haven't digested all of it yet but I have an unrelated question. How sensetive are you to startup times? I would like to do without AOT but keeping the same java interface you are using. This will result in 2-3 second compliation times but is overall simpler and avoids some versioning issues.

subes commented 2 years ago

Startup times are not too important.

cnuernber commented 2 years ago

New libjulia-clj version is up with no AOT - https://clojars.org/cnuernber/libjulia-clj

subes commented 2 years ago

This is the correct link: https://clojars.org/com.cnuernber/libjulia-clj Here the new test timings: grafik Before: grafik

So compilation seems to take about 7 seconds. Though I wonder why the parallel test got slower. Maybe clojure now has some additional overhead for each call to determine if lazy compilation is still needed?

cnuernber commented 2 years ago

Perhaps. Is that startup time OK with you? Personally it drives me crazy but the only other option is to have a parallel release of all of my jars as precompiled classes interact poorly with development time clojure practices as the class loaders get confused.

subes commented 2 years ago

Here new tests against 1.000-aot-beta-5: grafik And new tests against 1.000-aot-beta-4: grafik

So the compilation is more like 4 seconds. There is no runtime overhead. So everything is fine. Sometimes my notebook gets slower after returning from hibernation/suspend. I guess after a reboot the speed will be faster again.

cnuernber commented 2 years ago

Alright! We could eliminate those 4 seconds but as I said for at least this version it would be nice to be able to punt on that one.

subes commented 2 years ago

Yes, those 4 seconds are ok. Normally the processes run a bit longer. The platform takes a few seconds to initialize anyway with its bootstrap and Load-Time-Weaving of AspectJ also adds some overhead. Though since libjulia-clj/libpython-clj are anyway optional dependencies and both only get compiled when the functionality is accessed, everything is fine.

If one wants to have fast startups times, the JVM is not the right tool. Either workaround with GraalVM or implement the tool in Golang or something else that starts up instantly.

cnuernber / libjulia-clj

Make JNA Binding available to Java clients? #3