apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.39k stars 1.26k forks source link

Using SIMD for dealing with json (and more) at speed #13773

Open hpvd opened 1 month ago

hpvd commented 1 month ago

Using SIMD for dealing with json at speed

inspired by postgreSQL (up to 4-fold speedup): see: https://www.phoronix.com/news/PostgreSQL-Opt-JSON-Esc-SIMD

and since more and more CPUs support AVX512 or its successors: https://www.phoronix.com/review/simdjson-avx-512 https://simdjson.org/ used by Clickhouse, Apache Doris... https://github.com/simdjson/simdjson (Apache 2.0 licence)

abhioncbr commented 1 month ago

@hpvd, what would you suggest, using the simdjson library for all JSON data handling or something else?

hpvd commented 1 month ago

I think this would be a multistep approach. We can look what is possible on https://simdjson.org/ and just pick one place in Pinot and give it a try. In the end we can utilize it in many ways..

siddharthteotia commented 1 month ago

@hpvd @abhioncbr - I have been very interested in exploring more wide and holistic use of SIMD in Pinot. Historically, that endeavor has not been successful because of no support in Java for the low level primitives. JNI is of course an option.

For this issue, how are you planning to use SIMD in Pinot code base ? Is it via the JNI bridge that we build over Intel compiler intrinsics or using an abstraction (e.g JDK vector APi available in 14 onwards IIRC) or something else ?

siddharthteotia commented 1 month ago

My high level suggestion would be that if there is indeed a possible path to leverage SIMD acceleration in JAVA, rather than doing piece-wise work for a specific scenario, it would be better to first get a handle on how it will be integrated into Pinot code base so that we can also re-use them in more appropriate places (e.g in the query engine). Also need to evaluate the portability aspect as well.

We can look what is possible on https://simdjson.org/ and just pick one place in Pinot and give it a try. In the end we can utilize it in many ways..

Agree with POCing one aspect but when we actually decide to build the feature, it should ideally be done thinking of broader, long term use thinking about everything since we are likely going to introduce platform specific dependencies into the codebase.

hpvd commented 1 month ago

@siddharthteotia have you already looked into this one: https://github.com/simdjson/simdjson-java

hpvd commented 1 month ago

We may also look into Apache Doris how they leverage it....

abhioncbr commented 1 month ago

Yes, my understanding was also to use the simd Java bindings. As @hpvd suggested, we can explore how jdk based projects are using it and we can take a path forward based on that.

siddharthteotia commented 1 month ago

we can explore how jdk based projects are using it and we can take a path forward based on that.

+1. Yes let's do some survey

https://github.com/simdjson/simdjson-java

This is based on incubator version of vector support in JDK (Project Panama by Open JDK AFAIK). Note that the package still says "incubator" so I am not sure of production use / support for this. We have done this in the past where we took a dependency on less than productionized library (Lbuffer) and it proved to be unstable once in a while. Recently we have removed it.

So, I think as a first step it will be good to see if any of the latest versions of JDK actually support it or not before we go way deeper in the POC / performance evaluation with above library

Take a look at project Gandiva (under Arrow) too. We can also build a JNI bridge ourselves.

I think the investment really depends on some value via POC.

Curious if @gortiz / @richardstartin have any advice / suggestions.

hpvd commented 1 month ago

this article is already one year old, but pretty interesting: it shows how elastic / lucene leverage SIMD, handle incubating possibilities, show some benchmarks etc. https://www.elastic.co/de/blog/accelerating-vector-search-simd-instructions

hpvd commented 1 month ago

this includes history, state and goals of vector API in java: https://openjdk.org/jeps/469

kishoreg commented 1 month ago

This is a fantastic initiative and +100 on getting native SIMD. Given the pace at which Java is moving, it might be a good idea to slowly extract interfaces where SIMD can benefit. This will allow users/companies to stay with older jdk while other companies can move forward.

we don't want to stuck in the same mode as last time where moving out of Java 8 meant waiting for all users to migrate to Java 8.

hpvd commented 1 month ago

jep would be great, if we find a way were the people who want and can (-> no hard internal restrictions, suitable hardware selection..) are able to benefit from new possibilities without having to wait till everybody is ready.

hpvd commented 1 month ago

just edited the title to SIMD for dealing with json *(and more)* at speed :-)

gortiz commented 1 month ago

Curious if @gortiz / @richardstartin have any advice / suggestions.

I think explorations in this area are very interesting, but AFAIK Panama is not fast enough yet. Last month in JCrete we were discussing about how to access native code efficiently and it looks like nothing has changed (yet). Calling JNI/Panama code per row is prohibitively slow. The good news is that in single-stage engine and in the leaf stages in multi-stage engine these calls can be done at block level, so we should be able to absorb the cost of the JNI call.

hpvd commented 1 month ago

good overview and starter: SIMD Parallel Programming with the Vector API By José Paumard

This session explains the differences between parallel streams and parallel computing, and how SIMD computations are working internaly on simple examples. It then shows the patterns of code that the Vector API is giving along with their performances, and how you can use them to improve your in-memory data processing computations. More advanced techniques are also presented, to go beyond the basic examples.

https://www.youtube.com/watch?v=36DN9sE7ja4

includes usecases and basic speed comparisons: 2024-08-12_11h42_49 2024-08-12_11h42_24

hpvd commented 2 days ago

just to get an understanding how other projects handle this: for apache lucene, using more SIMD in an easy way is one of the reasons to make java v21 mandatory with the upcoming next major release of lucene (v10, planned for October 01, 2024 see https://github.com/apache/lucene/milestone/2)

Vectorization

Parallelism and concurrency, while distinct, often translate to "splitting a task so that it can be performed more quickly", or "doing more tasks at once". Lucene is continually looking at new algorithms and striving to implement existing ones in more performant and efficient ways. One area that is now more straightforward to us in Java is data level parallelism - the use of SIMD (Single Instruction Multiple Data) vector instructions to boost performance.

Lucene is using the latest JDK Vector API to implement vector distance computations that result in efficient hardware specific SIMD instructions. These instructions, when run on supporting hardware, can perform floating point dot product computations 8 times faster than the equivalent scalar code. This blog contains more specific information on this particular optimization.

With the move to Java 21 minimum, it is a lot more straightforward to see how we can use the JDK Vector API in more places. We're even experimenting with the possibility of calling customized SIMD implementations with FFI, since the overhead of the native call is now quite minimal.

https://www.elastic.co/search-labs/blog/lucene-and-java-moving-forward-together