elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.4k stars 24.56k forks source link

After update to Lucene 9.4 use `--enable-preview` on Java==19 (exact) to allow mmap use new JDK APIs #90526

Closed uschindler closed 1 year ago

uschindler commented 1 year ago

Description

Apache Lucene 9.4 will have support for Java 19 Panama APIs to mmap index files (using a MR-JAR). See https://github.com/apache/lucene/pull/912 for more information.

As those APIs are not yet enabled by default in the JDK, we have to still use some opt-in approach, controlled by Java's command line:

Some TODOs:

Important: Lucene 9.4 only supports this on Java 19 (exactly), because the APIs are in flux. If you start with Java 20, it falls back to the classical MMapDirectory. We will add support for Java 20 in a later release. The reason for this is that the class files of new implementation are marked by some special version numbers that make them ONLY compatible to Java 19, not earlier or later, to allow the JDK to apply changes to the API before final release in Java 21. But passing --enable-preview to later versions won't hurt, so maybe enable it on all versions >= 19.

A last note: The downside of this new code is that closing and unmapping an index file gets more heavy (it will trigger an safepoint in the JVM). We have not yet found out how much this impacts servers opening/closing index files a lot. Because of this we would really like Elastic / Elasticsearch to do benchmarking on this, ideally if their users and customers could optionally enable it. But benchmarking should be done now, because with hopefully Java 21, Lucene will use the new implementation by default. Java 20 will be the second and last preview round.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-delivery (Team:Delivery)

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

mark-vieira commented 1 year ago

@rdnm Is this something the core/infra team could look into, given this would probably be implemented in JvmErgonomics?

uschindler commented 1 year ago

Hi @mark-vieira

Is this something the core/infra team could look into, given this would probably be implemented in JvmErgonomics?

Possibly, but I don't know how this works. The JVM options must be set before the main server is started as those are not system properties. It is real JVM options. If JvmErgonomics is called from a tool that runs before the main server with the same JVM, it could just do if (Runtime.version().feature() == 19) and then add the --enable-preview to the default JVM options. The check must be exact, not >= 19.

mark-vieira commented 1 year ago

Right, that's where it would happen. The purpose of the jvm ergonomics code is to "compute" JVM arguments based on whatever environmental conditions we decide.

thecoop commented 1 year ago

As this can impact performance, we probably want an ES-level option to enable this for 8.6 (default probably off, as this is a preview feature). I also note this is a general option that enables all preview APIs - although the other previews are opt-in or compiler only (record patterns, switch patterns, virtual threads). Given this, we probably want this for 19 only, we can then re-evaluate the previews for 20 when that is released.

thecoop commented 1 year ago

JUL redirection is covered by #90547

ChrisHegarty commented 1 year ago

We probably don't want this "on by default", at least not initially - it seems less than idea to rely on Preview features in production. And as @thecoop says, enabling is a JVM-wide property.

Initial steps can be to setup a CI job with Java 19 that passes ES_JAVA_OPTS=--enable-preview - to flush out any issues.

Folks can also opt-in by the same mechanism, passing ES_JAVA_OPTS=--enable-preview.

thecoop commented 1 year ago

@uschindler Is it enough to specify this as a direct JVM option for now?

uschindler commented 1 year ago

I also note this is a general option that enables all preview APIs - although the other previews are opt-in or compiler only (record patterns, switch patterns, virtual threads).

Actually you misunderstood what enable-preview does: It just enables the API layer of those fetaures, the features are all there and enabled. The preview setting is only there to make sure that code compiled against another version of the JDK will not start, in that case it is opt-in. The setting is not enabling or bootstrapping those features (e.g., virtual threads are always there, there is just no API visible to spawn them). So theres no risk to enable preview mode unless you actually use preview APIs. The main reason for the switch is that if you spawn an app that uses preview APIs in its class files that those would reject to load. Theres nothing more. If you use reflection or MethodHandles you can also access preview APIs and spawn virtual threads without that setting. That said: The setting is just a safeguard to prevent code compiled against preview APIs to be used by everyone without thinking, as it won't work anymore with later JDK version.

Actually the problem here is: I would really like to see people use it because this is now 2.5 years already tested daily on Lucene benchmarks and the performance is same (as of JDK 18 incubation and now also with JDK 19).

The only open point is possibly a slowdown on many close/reopen of indexes. But this will also be there in 20 and 21 as it is a feature by design and theres no way to prevent those safepoints. That said, in Java 21 there will be no way back, Lucene will switch to that code without any way back (for safety reason), although it might slightly raise the level of safepoints or slowdown code heavily closing index files.

So please enable this feature by default if JDK 19 is used, otherwise theres no way to see results. You can add an option to DISABLE it. I also have customers in the chain who fail with this max_map_count sysctl. Those will be very happy.

ChrisHegarty commented 1 year ago

I'm not sure that we've even tested with --enable-preview yet, but regardless we should. Also, and bigger, we'll need to look at the performance characteristics of the Panama Memory segment MMapDirectory implementation - which is yet to be done. (thanks for highlighting the use of thread local handshakes when closing)

I somewhat dislike that enabling this in Lucene will amount to enabling all preview features at runtime (from the perspective of the JVM and the Java Platform). But since we don't compile with preview features enabled, then maybe this is not such an issue, but still... leaves a small window open for plugins, etc. To clarify (from a runtime perspective) - preview features are more than just Preview APIs. Setting --enable-preview enables additional class file support in the JVM.

uschindler commented 1 year ago

To clarify (from a runtime perspective) - preview features are more than just Preview APIs. Setting --enable-preview enables additional class file support in the JVM.

From the runtime perspective, that is the only thing it does. It allows to load classes with minor version number 65535 when the major version is exactly matching the runtime's bytecode version. Basically javac "marks" all class files using preview APIs or preview features of Javac with that minor version as "flag": "This class is using new features". The instanceof switch statement for example or the records are/were such features.

uschindler commented 1 year ago

Funny detail: If Lucene's build would have removed the 65535 minor version from the class files (I tried this out), the classes load also without preview flag and the new impl in lucene would work out of box without any preview flags. But the risk for Lucene is too large: if APIs change in 20 (and they possibly will), it will produce linkage errors. But it was tempting :-)

ChrisHegarty commented 1 year ago

From the runtime perspective, that is the only thing it does. It allows to load classes with minor version number 65535 when the major version is exactly matching the runtime's bytecode version.

In the case of what we're discussing here, the Lucene Panama Memory segment MMapDirectory implementation, I agree - there is little or no risk.

More generally, allowing classes with minor version 65535 is just the start of what is enabled by preview features. As a specific example, when I worked on the records preview, the JVM would only parse the Record_attribute if preview was enabled - it would otherwise fail to parse that attribute if found in another class file (this is a bit of a silly example).

I'm not sure of the specifics of Java 19 preview features, so I dunno what could be lurking there (if anything), and how dangerous it could be if ES or a plugin depended upon one. Though these days, most features are implemented through indy and standard runtime bootstraps, rather than new byte codes. But still, enabling preview affects code paths in the JVM and runtime.

uschindler commented 1 year ago

I'm not sure of the specifics of Java 19 preview features, so I dunno what could be lurking there (if anything), and how dangerous it could be if ES or a plugin depended upon one. Though these days, most features are implemented through indy and standard runtime bootstraps, rather than new byte codes. But still, enabling preview affects code paths in the JVM and runtime.

If nothing uses it a JDK with and without preview enabled should exactly the same. If a plugin depends on preview features it would require preview enabled, too, so why should there be any plugin with this? I clearly said in the title: "use --enable-preview with Lucene in Java ==19 (exact)". Any other combination does not need preview enabled.

Anyways: if a plugin would like to use those features, it can just use reflection to lookup public APIs, so there's no security problem and not even the good old security manager can prevent you from doing this inside the plugin. The module system is also outside. I think you may mix up incubating and preview. Incubation features are really problematic. Preview APIs are always available (read the spec!) to any code if you use MethodHandles/Reflection/whatever, they are just not available to statically compiled code: So any code can use for example MemorySegment with Java 19 - by reflection. If I would have known about your problems with it, I would have stripped the MR-JAR classes in Lucene off their preview bit (a few lines of our JAR packaging in Gradle) and would have written the code to enable it with a simple if/then/else based on Java version.

But do whatever you think is good, but please don't complain when JDK 21 comes out and no productive tests were done. As the current code is a frequent issue for crashes and requires direct access to internal APIs, passing a --enable-preview is clearly better than anything we have now. Elasticsearch has an installer with a prepackaged JDK that is full under your control. Start testing and benchmarking now but give users a chance to test this before next year. If you really are afraid about plugins, scan their JAR files for (short) -1 (0xFFFF) after the CAFEBABE. I can't say more. The f*king release process of preview APIs is long enough. How long can it take to bring out a feature like this. Wait 5 more years?

Uwe

P.S.: Damn that I did not strip the preview bit!

mark-vieira commented 1 year ago

But do whatever you think is good, but please don't complain when JDK 21 comes out and no productive tests were done.

I think we agree here. We've been testing on Java 19 since the EA release builds were out but we haven't done any testing with preview features enabled, primarily because we didn't have any code that leveraged them. So before we would make any real decisions here we'll need to actually test this alternative implementation.

Elasticsearch has an installer with a prepackaged JDK that is full under your control. Start testing and benchmarking now but give users a chance to test this before next year.

I think what @ChrisHegarty is potentially pushing back against is making this behavior the default in Elasticsearch. There is nothing stopping users from enabling this themselves on a Java 19 runtime.

The f*king release process of preview APIs is long enough. How long can it take to bring out a feature like this. Wait 5 more years?

I assume this is hyperbole, but no one would argue for such a long incubation period. In fact, I think Elasticsearch has done a pretty good job of staying ahead of the pace here. We made the decision to baseline Elasticsearch 8.0 on Java 17 shortly after it went GA and we have an open PR to bump the bundled runtime to Java 19 already as well.

So I don't think there's any fundamental reluctance to taking advantage of new language and runtime features. The difference here is that historically we've at least waited for these features to go GA first before enabling them for all users. Perhaps as you say, a feature being "preview" is just a matter of semantics, but the point remains that we don't have an precedence here.

uschindler commented 1 year ago

assume this is hyperbole

of course this was more pointing into the direction of OpenJDK not Elasticsearch. :-)

Perhaps as you say, a feature being "preview" is just a matter of semantics, but the point remains that we don't have an precedence here.

The semantics here are clearly focused on allowing the APIs to change and not like "risky". The code for Panama is already largely inside the code modules of Java since already Java 16. Also the current code using MappedByteBuffer goes through the same code paths over and over (because the ScopedMemoryAccess which was the enabler for the new unmapping semantics was also added to MappedByteBuffer and DirectByteBuffer in Java 16, so you already use it). The --enable-preview jsut enables us to use new APIs, not new features. The code triggered all there is plain old code - this is also why it is not slower. Its the same code, just safer unmapping and larger chunks because we are 64 bits now. :-)

mark-vieira commented 1 year ago

I've add a CI job which runs against Java 19 with --enable-preview passed to all tests. Let's see what shakes out.

uschindler commented 1 year ago

That one did not work because Gradle itsself won't run with Java 19. In Lucene we only run tests with JDK 18 or JDK 19, not the build itsself.

mark-vieira commented 1 year ago

That one did not work because Gradle itsself won't run with Java 19. In Lucene we only run tests with JDK 18 or JDK 19, not the build itsself.

Yep, I got it backwards. I've fixed this so that we set the runtime Java home to JDK 19 rather than the compiler.

uschindler commented 1 year ago

Do you have any benchmarks already for comparison. Just for interest to verify my own investigations?

ChrisHegarty commented 1 year ago

Thank you @mark-vieira - to confirm explicitly in this thread; in the logs of the aforementioned CI job, I see messages from Lucene indicating that it is using the Panama Memory segment MMapDirectory implementation: Using MemorySegmentIndexInput with Java 19+. And all tests pass successfully.

uschindler commented 1 year ago

Are those logs public? Because the generated files like log files can't be seen from Jenkins.

ChrisHegarty commented 1 year ago

FTR - Being intimately involved in JEP 11, JEP 12, the Panama Foreign Memory API, and the core of the JDK, I am fully aware of the impact of non-final features on the JDK and the Java Platform.

@uschindler I am largely "with you" when it comes to the very well constrained use in this particular case - the Lucene Memory segment IndexInput implementation. Where I am raising a concern is with the broad impact of enabling preview features JVM-wide by default (this is where we are not in agreement).

We now have a CI job that exercises the new implementation - great.

Do you have any benchmarks already for comparison. Just for interest to verify my own investigations?

I am not aware of any, but I will do what I can to get some perf numbers. I'm very interested in seeing this.

P.S.: Damn that I did not strip the preview bit!

I'm not sure if this comment is serious or a joke (or somewhere in between). If the former, then please raise it over on the Lucene project.

ChrisHegarty commented 1 year ago

Are those logs public? Because the generated files like log files can't be seen from Jenkins.

I don't think that the individual test-specific result logs are public, just the overall console output from the Gradle run - which is relatively silent for successful test runs. This is the reason I posted an explicit message here confirming that the job is testing what we expect it to. I'm not sure what (if anything) we can do about this, but happy to make enquiries.

thecoop commented 1 year ago

I agree with @ChrisHegarty. Speaking more broadly, do we want ES to, by default, use preview features in the JVM? Preview features are, by definition, not finalised:

Goals: Communicate the intent that code which uses preview features from an older release of the Java SE Platform will not necessarily compile or run on a newer release. ... Ultimately, the feature will either be granted final and permanent status (with or without refinements) or be removed.

The feature may be code-complete, but that doesn't mean it's ready to use in production settings yet. If it were, it would not be a preview feature. There may be obscure bugs still around, or usability problems that need to be updated in a breaking way in the next JVM version - so if users were to update the JVM underneath ES, it could randomly break.

We are absolutely not against users having the option to use the new APIs, with all the benefits they bring, and if they wish to do so, we are happy to provide functionality so they can turn it on on their own systems. But ES is used in many thousands of deployments, processing terabytes of data that many companies critically depend on. Should we, by default (and so making the decision on our customers behalf), use features that are not yet finalised in the JVM in those production systems? We would need a very, very strong reason to do so, and I don't think we've met that bar here.

uschindler commented 1 year ago

FTR - Being intimately involved in JEP 11, JEP 12, the Panama Foreign Memory API, and the core of the JDK, I am fully aware of the impact of non-final features on the JDK and the Java Platform.

@uschindler I am largely "with you" when it comes to the very well constrained use in this particular case - the Lucene Memory segment IndexInput implementation. Where I am raising a concern is with the broad impact of enabling preview features JVM-wide by default (this is where we are not in agreement).

Hi, I would suggest to do the following:

Another idea would be to enable only that option if you run Elastcisearch in Develope and not Production mode!?

We now have a CI job that exercises the new implementation - great.

Do you have any benchmarks already for comparison. Just for interest to verify my own investigations?

I am not aware of any, but I will do what I can to get some perf numbers. I'm very interested in seeing this.

Thanks!

P.S.: Damn that I did not strip the preview bit!

I'm not sure if this comment is serious or a joke (or somewhere in between). If the former, then please raise it over on the Lucene project.

That was a joke :-) But it is tempting to strip the preview bit! Because we use reflection/methodhandles to initialize the correct provider class that creates the IndexInputs anyways, we could also force to use the new impl when exactly Java 19 is detected. But in Lucene 9.4 I wanted to have it opt-in which perfectly fits the --enable-preview.

When discussing about this with Robert Muir in summer we discussed all those options :-)

Uwe

ChrisHegarty commented 1 year ago

That was a joke :-) But it is tempting to strip the preview bit! Because we use reflection/methodhandles to initialize the correct provider class that creates the IndexInputs anyways, we could also force to use the new impl when exactly Java 19 is detected. But in Lucene 9.4 I wanted to have it opt-in which perfectly fits the --enable-preview.

The requirement to have an opt-in is good, but why does it have to be with --enable-preview? Why not a Lucene-specific opt-in mechanism (e.g. system property), rather than overloading the broader java launcher option? (assuming that you set the minor version to 00 in the build). This would allow you to enable it by default in Lucene.

uschindler commented 1 year ago

The problem is that the user has to enable preview anyways, because we don't patch away the classfile minor version. If we would compile the code with java 19, then remove preview bit, then we could do this. But as the class files only work with preview enabled there's no need for a second option and would complicate thigs. And sorry, the --enable-preview option fits perfectly for this. I still don't understand what you rpoblem with it is. If you pass it or not it won't change anything. Please read the specification, there is no risk in enabling it if you have the control about the toolchain and runtime.

ChrisHegarty commented 1 year ago

because we don't patch away the classfile minor version. If we would compile the code with java 19, then remove preview bit, then we could do this

Right. That is my question - why do you not do this (given your arguments so far in this thread)? Why impose the system-wide enablement of all preview features in all contexts of the running JVM, if you just want to enable one small code path.

uschindler commented 1 year ago

We can think about this for next version (Java 20). I have to still verify if there are any other problems with it. Of course the Lucene bootstrapping code needs to be changed. When setting this up passing --enable-preview sounds like a good option. IAnyways, enabling preview does not enable any risky features in Java 19, as it is only meant to prevent backwards features. If you show me anythink risky that gets enabled with 19, I would agree with you, but actually there's nothing: Only API features, but no special classfile features. I remember this with records and this was IMHO too much magic in the classfile format. Aren't records not enabled already in 19?

uschindler commented 1 year ago

I checked the preview features available in 19:

From that list: Other than virtual threads theres nothing outsde of panama that gets enabled.

jpountz commented 1 year ago

While I'm supportive of testing and benchmarking this new Panama-based directory, I agree with Chris, Mark and Simon on not enabling it in production by default.

uschindler commented 1 year ago

FYI, I forgot to mention this here. Enabling preview will be obsolete with Lucene 9.5: https://github.com/apache/lucene/pull/12033

So Panama will be used automatically when Java 19 (and soon Java 20, see https://github.com/apache/lucene/pull/12042) was detected. Whatever comes first, Lucene 9.5 or Java 20, you will get both variants with Lucene 9.5.

So I think we can close this issue?

ChrisHegarty commented 1 year ago

We might want to document, or otherwise note, the presence of the org.apache.lucene.store.MMapDirectory.enableMemorySegments system property, but otherwise there is nothing more to be done in this issue.

jpountz commented 1 year ago

Neither tests nor benchmarks have noticed a difference after this change, so I'm keen on not documenting this (super expert) system property. Closing. Thanks @uschindler!

uschindler commented 1 year ago

By the way, I would suggest to change the code in this other PR where it applies some regex to the system.out/err output to suppress the messages. Instead, I'd suggest to use log4j-jul module (https://logging.apache.org/log4j/log4j-2.17.0/log4j-jul/index.html) so all Lucene logging goes through Log4J. Then its easy to suppress the warnings and info messages.

(I was a bit shocked about this strange filtering of system out)

uschindler commented 1 year ago

FYI, I wanted to show you this commit, unfortunately in this other forked repository. They use log4j-jul module and configure it in LogConfigurator class: https://github.com/opensearch-project/OpenSearch/commit/beb99150a84d54bb26182d50e363beea79d1a09a

You can then simply set loglevel org.apache.lucene.store package to "WARN"/"ERROR"/"NONE" in your logging config and users can configure it as usual. It is generally recommended to enable log4j-jul because also other parts of Lucene now seldomly log messages on failures/error or misconfiguration.

Sorry for posting this here.

rjernst commented 1 year ago

Thanks for the additional note @uschindler. I captured the suggestion in https://github.com/elastic/elasticsearch/issues/94613