Closed zyl891229 closed 4 months ago
Ya, you are correct, @zyl891229 . That's the community direction in Apache ORC 2.0 and Apache Spark 4.0.
Although Apache ORC 1.9.x will be supported according to the three-year support policy, new features and improvements are not the scope of backporting because Apache ORC follows Semantic Versioning
policy.
May I ask
@dongjoon-hyun thanks for response
What is your environment? We want to use dataLake(icebreg/hudi) in production, with data files in orc format. We use zstd high level compression to achieve the goal of reducing storage costs.
Why you cannot use Java 17+? The production cluster environment is java 8 (spark 3.3) and cannot be upgraded for a while.
Which Apache ORC versions are you using currently? 1.8.3 , this version cannot support change zstd level (airlift compress) , with worse performance than zstd-jni(luben zstd-jni)
How you use Apache ORC Java library? Write impl is by the datalake sdk using orc.
Thank you for the info.
Here are my thoughts.
Mismatched Versions: Apache ORC community intentionally aligns the dev cycles with Apache Spark. In principle, both Apache Spark and ORC community don't recommend to use Apache ORC 1.8 with Spark 3.3 because it's beyond of the community test coverage.
End-Of-Life: Apache Spark 3.3 is in the end of support status and I was the release manager for that EOL release (Apache Spark 3.3.4). Apache ORC has been supporting Apache ORC 1.7.x and 1.7.x will reach the EOL in six months. Only one or two bug fixes are expected (if exists.)
Given that (1) and (2), I don't think that there is a way for both Apache Spark and ORC community's next release to be able to help you officially. Since you are on your own build currently, my recommendation is that you can cherry-pick and try it with your own risks.
Let me close this issue first. We can continue to discuss about this topic on this thread.
As we know orc 2.0 drop Java 8 and make Java 17 by default. But support new zstd impl [zstd-jni] at the same time. We really need the new zstd library (can change level & better performance) but we also want to use it in java 8
thanks