apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.3k stars 3.48k forks source link

[Java] Investigate potential performance improvement of compression codec #27743

Open asfimport opened 3 years ago

asfimport commented 3 years ago

In response to the discussion in https://github.com/apache/arrow/pull/8949/files#r588046787

There are some performance penalties in the implementation of the compression codecs (e.g. data copying between heap/off-heap data). We need to revise the code to improve the performance.

We should also provide some benchmarks to validate that the performance actually improves.

Reporter: Liya Fan / @liyafan82

Note: This issue was originally created as ARROW-11901. Please see the migration documentation for further details.

asfimport commented 3 years ago

Benjamin Wilhelm: Note that there is a discussion about the LZ4 library selection on the mailing list:

https://lists.apache.org/thread.html/reb8ae01ad544072ce1dd77feea640aab2e9834f55ccee04292e9da42%40%3Cdev.arrow.apache.org%3E

asfimport commented 3 years ago

Bob Tinsman / @bobtins: Hi, noticed that you were working on the LZ4 issue, which I was curious about, since Java and performance are both interests of mine.

I am happy to help by profiling code.

@emkornfield mentioned airlift as being Java based but still fast, so I checked it out.

Its core code uses off-heap access which could explain its speed.

For example, check out the core decompressor code: https://github.com/airlift/aircompressor/blob/master/src/main/java/io/airlift/compress/lz4/Lz4RawDecompressor.java

This is similar to Arrow's vector implementations, which allocate an off-heap chunk of memory, then use Unsafe methods to access it.

asfimport commented 3 years ago

Liya Fan / @liyafan82: @bobtins Thanks for your valuable input. It seems Airlift could solve the problem.

asfimport commented 3 years ago

Micah Kornfield / @emkornfield: As noted in the discussion on ML Airlift seems to only support Raw not framed (and is slower then JNI).

 

[~benjamin.wilhelm@knime.com] for this specific Jira as starting place maybe we can checkin the benchmarks that you made for LZ4 compression.    I think there a few follow-up issues (not asking any one in particular to work on them):

1.  Provide JNI bindings that support framed compression 

2.  Provide a performant pure java decompression for those that don't want to use JNI 

3.  Use the existing LZ4 java bindings for compression.

 

asfimport commented 2 years ago

Benjamin Wilhelm: For JNI bindings that support frame compression, I created a PR in the project javacpp-presets: https://github.com/bytedeco/javacpp-presets/pull/1094. Once this is merged I can implement an instance of CompressionCodec using these bindings.

asfimport commented 2 years ago

Micah Kornfield / @emkornfield: Does the presets library add a lot of value? Could this be done in a new package within Arrow. I'm a little hesitant to take a new dependency (or would at least want to do more research in terms of viability of the project/how widely used the packages in the repo are used).

asfimport commented 2 years ago

Samuel Audet: @emkornfield, since the C++ builds of Arrow already include LZ4, it is indeed pretty trivial to expose a few JNI methods to access it. The larger picture though is that the overall Java API of Arrow itself is still pretty limited and inefficient, even after 5 years in development! And there are users such as [~benjamin.wilhelm@knime.com] that require more performance, and that's why there are also JavaCPP Presets for the C++ API of Arrow: https://github.com/bytedeco/javacpp-presets/tree/master/arrow

Now, the C++ API doesn't always map very elegantly to Java, but it is tons faster, and maps a lot more functionality. This would be a discussion for another thread, but if the Java API of Arrow were to be based on JavaCPP, it would allow users to fall back easily on that API, instead of forcing them to start hacking stuff in JNI. Case in point, the arrow::util::Codec class has been usable from Java for almost 2 years now: https://github.com/bytedeco/javacpp-presets/blob/master/arrow/src/gen/java/org/bytedeco/arrow/Codec.java

I would be happy to maintain those presets as part of the Arrow project, just like I'm currently doing in the case of TensorFlow for Java: https://github.com/tensorflow/java/search?q=javacpp

Previous discussions with people from Apache Arrow didn't elicit much interest, but in time the need for a tool like Cython in Java will become obvious to all, and JavaCPP already provides that!

asfimport commented 2 years ago

Benjamin Wilhelm: We at KNIME are currently using the official Java Arrow library for our upcoming table backend (https://www.knime.com/blog/improved-performance-with-new-table-backend ). It works for us, and we will keep using it. As Samuel pointed out, it might be a valid idea to base the Java API on JavaCPP, but this is not the right place for this discussion (a thread in the mailing list?).

However, a significant problem with the Java API was/is the missing fast compression using LZ4. The JavaCPP project was the easiest and fastest way to get a very fast LZ4 API for Java (supporting frame compression as needed). I already implemented CompressionCodec using these bindings, and we (at KNIME) will use it with the next release.

Seeing where the JavaCPP is used I think it is a viable project. I could contribute my CompressionCodec implementation to Arrow if this is desired. Creating JNI bindings for LZ4 in the Arrow repository would take more time and I won't be able to do this soon.

asfimport commented 2 years ago

Micah Kornfield / @emkornfield:

As Samuel pointed out, it might be a valid idea to base the Java API on JavaCPP, but this is not the right place for this discussion (a thread in the mailing list?). This would be a mailing dev@ mailing list discussion.  I don't think we would eliminate the existing API, but there might be some interest alternative Java APIs.

 

Seeing where the JavaCPP is used I think it is a viable project. I could contribute my CompressionCodec implementation to Arrow if this is desired. Creating JNI bindings for LZ4 in the Arrow repository would take more time and I won't be able to do this soon. [~benjamin.wilhelm@knime.com] Do you have pointers?  I looked maybe too quickly and didn't see it used in other Apache projects for instance.  If you have something that works for your use-case that is great, and if you want to open-source it also great, but it might need to live in a KNIME hosted project for the time being.  I believe Arrow is now building JNI bindings for all major platforms, so the release story is a little bit better for a JNI code hosted by Arrow, I'll see how hard it would be to make the bindings at this point.

asfimport commented 2 years ago

Samuel Audet:

This would be a mailing dev@ mailing list discussion.  I don't think we would eliminate the existing API, but there might be some interest alternative Java APIs. It's not about eliminating anything, it's about developing the existing Java API, such as this very specific use case for compression codecs. [~benjamin.wilhelm@knime.com] was able to wrap LZ4 using JavaCPP, all by himself! it's a lot easier to do than code everything manually with JNI: https://github.com/bytedeco/javacpp-presets/pull/1094

The Python API of Arrow isn't just automatically generated wrappers around the C++ API using Cython, right? It's the same for Java. We can use tools like Cython to make the life of Python developers easier, so why not do the same for Java developers?

We were able to cut the wrapping code in half by rebasing the Java API of TensorFlow on JavaCPP, and performance increased to boot: https://github.com/tensorflow/java/pull/18#issuecomment-579600568

We could do the same for Arrow!

[~benjamin.wilhelm@knime.com] Do you have pointers?  I looked maybe too quickly and didn't see it used in other Apache projects for instance.  If you have something that works for your use-case that is great, and if you want to open-source it also great, but it might need to live in a KNIME hosted project for the time being.  I believe Arrow is now building JNI bindings for all major platforms, so the release story is a little bit better for a JNI code hosted by Arrow, I'll see how hard it would be to make the bindings at this point. When it comes to Apache projects, I tried to donate the JavaCPP Presets for MXNet, but they don't seem interested anymore: https://github.com/apache/incubator-mxnet/pull/19797

I'm also publishing builds for Apache TVM as well, but again, not getting much traction: http://bytedeco.org/news/2020/12/12/deploy-models-with-javacpp-and-tvm/

If you have some ideas as to why most engineers are OK using Cython in the case of Python, but not the equivalent in the case of Java, I would be very much interested in hearing your opinions.

asfimport commented 2 years ago

Micah Kornfield / @emkornfield:

It's not about eliminating anything, it's about developing the existing Java API, such as this very specific use case for compression codecs. [~benjamin.wilhelm@knime.com] was able to wrap LZ4 using JavaCPP, all by himself! it's a lot easier to do than code everything manually with JNI: https://github.com/bytedeco/javacpp-presets/pull/1094 I think there is some miscommunication, on what I thought were 2 separate issues.  How to implement an efficient LZ4 decoder and whether to base the Java API as a wrapper on the C++ API.  The second would essentially would need a heavy rewrite of the Java API as it is fundamentally different than the design of the C++ API.  I think there could be some interest from consumers of Arrow in an API that more accurately mimics the C++ version, but again that is a different thread.  It could be for some of the more complex bindings (DataSets) JavaCPP might be a better choice then hand-coded JNI.

 

@emkornfield, since the C++ builds of Arrow already include LZ4, it is indeed pretty trivial to expose a few JNI methods to access it. I was not referring to binding to the C++ implementation here but directly to the LZ4 library.  It looks like JavaCPP makes this efficient from a developer perspective.  But the API isn't quite what I imagined, it looks like it goes through ByteBuffer, when all we really need is something like ZSTD API.  For such a minimal API I'm ambivalent on taking on a new dependency here.

 

If you have some ideas as to why most engineers are OK using Cython in the case of Python, but not the equivalent in the case of Java, I would be very much interested in hearing your opinions. I'm not an expert but a few thoughts:

  1. Cython is more then just a C++ wrapper.  It speeds up python even if you never want to write native code by effectively allowing one to write C code as python.  In Java, at least in theory, the JIT can do some heavy lifting here.
  2. The Python GIL is a pain point that Java doesn't have and Cython + Native code can effectively work around it.
  3. There has always been a tight relationship between Python and Native code where as JNI is much more esoteric, and can cause unexpected deployment issues (e.g. correctly pointing the JVM to .so files, correctly integrating with the JVM's memory capacity features, etc). 
  4. Cython was also a pretty easy way to get compatibility between python 2.x and python 3.x

Sometimes there is watershed moment, more mature projects can be reluctant to try new technologies unless they are proven elsewhere and they solve a significant pain-point.  

We could do the same for Arrow! The dev@ mailing list is the place to discuss this.  I tried searching and couldn't find any previous discussions on the topic there.

asfimport commented 2 years ago

Benjamin Wilhelm: I will just add one small comment for now:

But the API isn't quite what I imagined, it looks like it goes through ByteBuffer, when all we really need is something like ZSTD API I just used ByteBuffer for the example code. If the memory address is known (which it is for {}ArrowBuf{}) a Pointer to this memory address can be used (which I do in my implementation of {}CompressionCodec{}). The API is still a bit annoying to use (mainly because decompression cannot be done with one call and it does not feel like a Java API) but it is not limited to Integer.MAX_VALUE bytes. Maybe, there could be a convenience layer in the org.bytedeco.lz4 lib.

asfimport commented 2 years ago

Samuel Audet:

I was not referring to binding to the C++ implementation here but directly to the LZ4 library.  It looks like JavaCPP makes this efficient from a developer perspective.  But the API isn't quite what I imagined, it looks like it goes through ByteBuffer, when all we really need is something like ZSTD API.  For such a minimal API I'm ambivalent on taking on a new dependency here. Could you expand on this point? Why do you consider zstd-jni to be minimal, but not code generated with JavaCPP? To me it looks like zstd-jni is a lot larger in size than the JavaCPP Presets for LZ4, even when considering only the builds in common: https://repo1.maven.org/maven2/com/github/luben/zstd-jni/1.5.0-4/ https://repo1.maven.org/maven2/org/bytedeco/lz4/1.9.3-1.5.6/

As for the non-ByteBuffer API, what you are looking for are the overloads taking Pointer, which is just a fancy wrapper around a long value: https://github.com/bytedeco/javacpp-presets/blob/master/lz4/src/gen/java/org/bytedeco/lz4/global/lz4.java#L188 That does exactly like zstd-jni!