apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.42k stars 1.27k forks source link

Use Foreign Memory API, introduced in Java 22 #12809

Open gortiz opened 5 months ago

gortiz commented 5 months ago

This is a sibling of #12810

TL;DR:

Create a new buffer implementation that uses official Java Foreign Memory API. See JEP-454.

As a requirement, this addition should:

Context

As we may know Java buffers length has been limited to 2GBs. In Apache Pinot we decided to skip that limit by using LArray, a third party buffer library that uses JNI to create larger buffers. That library is not maintained anymore (see https://github.com/xerial/larray/issues/75#issuecomment-1327176483 or https://github.com/xerial/larray/issues/80).\

This library has several problems, including:

In order to solve these problems, we introduced a new buffer implementation called unsafe. The name comes from the fact that uses sun.misc.Unsafe, but AFAIK it is safer than LArray. It works and that is what we use by default if at runtime we detect we are using Java >= 15.

But in March 2024, Java 22 was released. This new Java version includes JEP-454 Foreign Memory API, which includes new APIs that solve the original issue (be able to index buffers with 64 bits) while providing a more modern, safer, tested and stable API.

When I've added unsafe buffers, I've also tried Foreign Memory API (in preview state) and it was super simple to create a buffer implementation with that. The single problem I had was maven related. Given this API can only be used in Java 22 (or maybe 21 if preview features are enabled), we can only compile that code with that Java version and we need to be sure that code is not loaded if Java < 21 is used. At the time I didn't had time to deal with these two issues and therefore I'm creating this issue as a reminder for my future self or, alternative, as a birdcall in case someone else wants to learn more about Maven and/or Foreign Memory APi.

Proposed solution

aditya0811 commented 5 months ago

Hello @gortiz I can try looking into this. I primarily see two parts to this. a) Implementing buffer implementation using Foreign API by creating maven sub project. b) Configuring classes to be loaded for this sub-project with the detected Java. For now I will consider we are going with option 1.

For a) I will start going through what we have implemented currently in package : org.apache.pinot.segment.spi.memory.unsafe.

Let me know if there are any specifics we want to start looking into initially.

gortiz commented 5 months ago

I think there is a third part which may be more complex: How to compile a maven subproject that requires Java 22 while not affecting the rest of the project (which requires Java 11 or 8).

About point A, I already created one implementation at the same time I've created the unsafe buffers, but ended up removing it to do not have to deal with the third point I mentioned. The code is already in git (see for example this commit). Feel free to get inspiration from it

aditya0811 commented 5 months ago

ohk, in this case I will start looking for

How to compile a maven subproject that requires Java 22 while not affecting the rest of the project (which requires Java 11 or 8).

aditya0811 commented 5 months ago

@gortiz why cant we use reflection as discussed in para 1 here? And depending on java runtime invoke appropriate methods.

gortiz commented 5 months ago

why cant we use reflection as discussed in para 1 here?

What do you mean with that? To always call the methods using reflection? That sounds unsafe for sure and probably not very efficient. And the buffer code is part of the Pinot hot path, so it should be as efficient as possible. By unsafe I mean that if for whatever reason we end up even loading the class that is compiled with Java > 11 we will end up with an unexpected error in runtime.

Therefore we cannot add code compiled with Java 22 in the main source code because we need our executables to be compatible with Java 11 (and the driver with Java 8). The clean solution to do that is multi-module projects. You can see in Pinot distributables that our dependencies are already doing that (including for example roaringbitmap).

I think the solution named as single-project in maven documentation should be fine. Specifically, we need a project with no code in the main source folder (where Java 11 compatible code must stay) and then another folder for Java 22 should contain all the new code. Then with Maven we would only compile Java 22 code if the JRE running the compilation process is at least Java 22.

aditya0811 commented 5 months ago

ohk, let me try the solution in single project.

aditya0811 commented 4 months ago

@gortiz I tried defining folder structure this way, according to conventions. The profile activation is implicit, getting activated for jdk 22 or higher.

We can keep the folder pinot-plugins/pinot-foreign-memory/src/main/java empty. And use pinot-plugins/pinot-foreign-memory/src/main/java22.

Let me know your thoughts.

gortiz commented 3 months ago

I think it make sense. Once we have that, we would need to study how to publish the code in the pipeline. Right now we compile with Java 11, so we would only publish an empty jar

aditya0811 commented 3 months ago

@gortiz 1) Shall we start implementing Buffer using Foreign Memory by adding necessary files in pinot-plugins/pinot-foreign-memory/src/main/java22? Or is there any other discovery you would recommend before proceeding ahead. Maybe on the lines of how to publish the code in the pipeline.

2) If no discovery required, I will refer the code you shared earlier. I am yet to figure out few things regarding existing Buffer use case for Pinot. Can I use this space for any questions, or we want to move this discussion to maybe slack thread.

3) After (1) & (2), I would need your inputs regarding the way we would test this.

gortiz commented 3 months ago

I think we still need to figure out how to include this module in the CI compilation, but feel free to work on the actual code if you prefer.

Just to be clear, as indicated before, how to code the buffers using the new API is something we already figured out. The most challenger thing is how to mix code that requires modern versions of Java in the CI, packaging and runtime code.

About testing, we have unit tests that tests the buffer implementation. You would need to inherit PinotDataBufferTest. That should only be executed in Java 22 (or maybe Java 21 if --preview-enabled is set). We would also need to modify the CI to run integration tests with Java 22 and using this new buffer api. This can be done by using ENV variables in PinotDataBuffer (as we are doing right now with PINOT_OFFHEAP_SKIP_BYTEBUFFER (see https://github.com/apache/pinot/blob/master/.github/workflows/pinot_tests.yml)

aditya0811 commented 3 months ago

I will leave the implementation part, for now, for this discussion.

Let me try out things in these lines

how to mix code that requires modern versions of Java in the CI, packaging and runtime code.

I am trying to understand the steps in CI process. However, was unable to find the exact workflow file(.yml in .github/workflows) if we use it. Wanted to understand how we are packaging pinot before release.

gortiz commented 3 months ago

pinot-test.yml is the one that tests the code.

aditya0811 commented 2 months ago

1) Can we build the pinot with highest JDK version(>=22) in the CI? We have in pom.xml <release>${jdk.version}</release>, where jdk.version=11. <release> avoids any usage of future version APIs.
For FFM module we will keep 22. Also, can you please help me with the CI link, that is used to package pinot? I could not find in github workflows. 2) For UTs I am yet to find a way, that UTs can use the class files generated out of non source folder i.e. java22.

Another challenge with single project approach is the IDE does not recognize classes in java22 folder, during development.

gortiz commented 2 months ago

I recently found another place where compiling with Java 21 would be useful. I was trying to change the pipeline to support that, but I wasn't able to fix it yet and I cannot focus on that feature right now.

To change the Java version used during compilation without affecting the final code (ie let users run in Java 11) is something that is not trivial. Specifically, we need to be sure we are not breaking anything.

I've open https://github.com/apache/pinot/pull/13619 so you can take a look. We are probably going to need some help from @xiangfu0 to change the CIs (specially the one that publish docker images).

We should also discuss about the possibility of moving the codebase that is designed to be Java 11 compatible to 21 (aka everything but the driver).

hpvd commented 2 months ago

We should also discuss about the possibility of moving the codebase that is designed to be Java 11 compatible to 21 (aka everything but the driver).

+1 on this beside interesting new features and possible performance benefits we should also care for this because of End of Live (EOL) for Red Hat builds of Openjdk v11 on 30 Oct 2024 see e.g. https://endoflife.date/redhat-build-of-openjdk

@gortiz would you mind open a issue for that, or where would be a good place for discussion?

gortiz commented 2 months ago

we should also care for this because of End of Live (EOL) for Red Hat builds of Openjdk v11 on 30 Oct 2024 see e.g. https://endoflife.date/redhat-build-of-openjdk

Just to be clear, Pinot already runs in Java 21. In fact it is the recommended way to run Pinot because there are sensible performance improvements we get for free. We are just talking about the compilation process.

aditya0811 commented 2 months ago

Are these still open questions? I was unable to approach this as part of this issue. a) When we are adding src/main/java21, by configuring a different profile, how are we configuring UTs for classes present in src/main/java21? b) How are we planning to trigger java21 profile in CI, without compiling other code with Java 21. By default the maven picks up the java version it is configured with. I am assuming that java version is 11 as of now, not triggering compilation for src/main/java21.