Closed danepitkin closed 2 months ago
The discussion thread on dev@arrow.apache.org
: https://lists.apache.org/thread/s07jx58yw4mkl54t3bkggnyg0sftcrr8
In addition, the following dependencies are pinned for JDK8:
In order to enable support for JDK21, this draft PR will force us to pin Mockito dependency version needed for JDK8 and another version for JDK11+.
Error prone 2.11+ requires JDK11 minimum supported version, thus it was necessary to pin JDK8 compatibility.
Apache Spark has dropped support for Java 8 and 11 on the main branch (targeting a 4.0 release) https://github.com/apache/spark/pull/43005
Edit: Spark 4.0 release timeframe is 2024-06[1]
[1]https://lists.apache.org/thread/xhkgj60j361gdpywoxxz7qspp2w80ry6
Netty 5.0 will remove support for Java 8 https://github.com/netty/netty/pull/10650
The current consensus on the Arrow mailing list[1] is to postpone Java 8 deprecation and to revisit it when Spark releases 4.0, which deprecates Java 8 (~2024-06).
[1] https://lists.apache.org/thread/kml53f81z1oskcf00xl7wlbcjssmn91g
Apache Derby continuously drops support for older JDK versions https://github.com/apache/arrow/pull/38813
My apologies!
I accidentally unpinned this issue because I thought I had pinned it just for me, by accident. I just repinned it.
Apache Iceberg is considering dropping java 8 support https://lists.apache.org/thread/ntrk2thvsg9tdccwd4flsdz9gg743368
New mailing list discussion: https://lists.apache.org/thread/65vqpmrrtpshxo53572zcv91j1lb2y8g
Apologies, I also unpinned it thinking this was just my GitHub view :joy:
I've looked into this and have some notes.
When compiling Java code in Java 9 or higher, you can use both the classpath and the module-path.
UNNAMED
module.module-info.java
file will be a Java module as expected.module-info.java
file will be treated as automatic Java modules. The names of the modules are dependent on the name of the Jar file. This creates deployment issues.Maven may choose to use both the classpath and module-path.
module-info.java
file, then all libraries with a module-info.java
file will be placed in the module-path. All other libraries will be on the classpath (this can be configured).module-info.java
file in the module-path. This will cause them to become automatic Java modules.A first step migrating to Java 11 would be to remove (or hide) the module-info.java
files. This would cause Maven to put everything on the classpath and not cause any build issues. We would not be distributing any module information, so consumers would have to treat Arrow modules as either automatic Java modules or put them on the classpath.
Without the module-info.java
files, IntelliJ can easily resolve dependencies and is able to run unit tests.
Longer term, we should include proper module-info.java
files in all Arrow modules. Not all of Arrow's dependencies have a module-info.java
file, such as flatbuffers-java
. It is not reliable to treat these as automatic Java modules during build, since that depends on the file name. We could either shade in the java classes or keep such dependencies on the classpath. If they are on the classpath, then we cannot declare any dependency on them in the module-info.java
file and consumers may need extra flags when compiling/running projects depending on Arrow.
I recommend shading in legacy dependencies. This ease the burden for consumers of Arrow libraries. We would not expose packages from those libaries. Consumers can simply add Arrow libraries to the module path without needed flags to grant Arrow modules access to the UNNAMED module.
Some dependencies are obsolete, such as jsr305
. We should migrate away from obsolete dependencies. The ThreadSafe
annotation could have use, but it is becoming increasingly unlikely that anyone would consume it.
Do you know why module-info.java
files were added in the first place? It seems weird to have to remove them because arrow is moving to java 9+, and I guess it could be considered as a public api breakage?
I also haven't observed any change of behavior from "Maven" based on the presence or absence of module-info.java
either. Maybe it's a plugin thing? Do you have pointers?
Do you know why
module-info.java
files were added in the first place? It seems weird to have to remove them because arrow is moving to java 9+, and I guess it could be considered as a public api breakage?I also haven't observed any change of behavior from "Maven" based on the presence or absence of
module-info.java
either. Maybe it's a plugin thing? Do you have pointers?
The module-info.java files were added to support JPMS in Arrow 17.
When running surefire and failsafe, maven will put JARs with a module-info.class file in the module-path instead of the classpath (when running >JDK8). IIRC there's an option to force using the classpath instead.
The module-info.java files were added to support JPMS in Arrow 17.
Arrow 16 you meant? Still why was JPMS support needed? Other projects like iceberg and parquet do not provide JPMS support. #13072 description goes over some of the supposed benefits of JPMS but nothing like a concrete issue the project is trying to solve and it seems now we are discussing removing (temporarily) JPMS support in order to move to Java 11? Something doesn't add up
@jduo There is no option to force using the classpath. You are probably thinking of "useModulePath", which can be true or false. When you target Java 9 or higher, that only controls what happens to dependencies that do not have a module-info.java
file. Maven will always use the module-path for dependencies with a module-info.java
file.
This work is intended for Arrow 18. I was looking for a way to split up the work. I am not suggesting removing a feature from Arrow for Arrow 18.
There are issues with the current module-info.java
files. They are making use of automatic module names, which are based off the name of the Jar file. This is not reliable, and also needs to be fixed.
Given the sensitivity here, it looks like everything must be solved in one commit.
@jduo There is no option to force using the classpath. You are probably thinking of "useModulePath", which can be true or false. When you target Java 9 or higher, that only controls what happens to dependencies that do not have a
module-info.java
file. Maven will always use the module-path for dependencies with amodule-info.java
file.
But since code is tested with Java 11 and higher, doesn't it mean that this already works?
There are issues with the current
module-info.java
files. They are making use of automatic module names, which are based off the name of the Jar file. This is not reliable, and also needs to be fixed.
It seems to be a separate issue from this one, isn't it?
This didn't show up yet since the target version of Java is 1.8.
The Maven compiler plugin cares about what the target version of Java is. Currently Arrow targets Java 1.8, so all libraries are placed on the classpath (even if using JDK 11). When targeting Java 9 or higher, Maven compiler plugin will start to look for "module-info.java" files and decide on whether libraries belong in the classpath or module-path.
Use of automatic modules is a separate issue, but may get higher visibility once Java 11 is the minimum for Arrow. More users may start to make use of the JPMS modules.
Switching Arrow to Java 11 is not as simple as changing only the target version of Java. That will cause the Maven compiler plugin to use of the module-path for most dependencies and exposes issues with the existing module-info.java
files. I suspect that the module-info.java
files were only tested at runtime (with unit tests) not at compile time since the target version of Java was always 1.8. Trying to verify this.
I've looked into the CI builds using JDK 11. Those builds still target Java 1.8 when compiling Java code.
As the proof is in the pudding, I took a stab at dropping JDK 8 support and created a pull request
Issue resolved by pull request 43139 https://github.com/apache/arrow/pull/43139
Describe the enhancement requested
[1]https://en.wikipedia.org/wiki/Java_Platform_Module_System [2]https://github.com/apache/arrow/blob/main/dev/release/verify-release-candidate.sh#L571 [3]https://github.com/apache/arrow/pull/37723#discussion_r1330578945 [4]https://github.com/apache/arrow/pull/13072#issuecomment-1731904205 [5]https://newrelic.com/sites/default/files/2023-04/new-relic-2023-state-of-the-java-ecosystem-2023-04-20.pdf
Post-upgrade tasks
Component(s)
Java