Open Fokko opened 4 months ago
I'm going to propose this is done with a cull of the hadoop-2 profile, with other cleanup code done more incrementally.
Or would you want HadoopStreams
cleaned up at the same time? It'd be the nice tangible "this is why it is worthwhile" change?
Hey @steveloughran! As a first PR, I'd love to remove the Hadoop 2 profile and the error-prone reflection. Next, we can do incremental cleanup. The discussion has been open on the dev-list for some time now, let me conclude it over there.
+1; will submit both. One thing to consider here is actually dropping the hadoop 3 version to 3.3.0 to guarantee all API/tests are against that version. Avoids any accidental use of newer classes/methods/constants etc.
One thing to consider here is actually dropping the hadoop 3 version to 3.3.0 to guarantee all API/tests are against that version. Avoids any accidental use of newer classes/methods/constants etc.
Yes, I was also thinking about that. I like that idea (or testing against both 3.3.x and 3.4.x).
I've actually been thinking about having a format-test module in Hadoop, which contains basic Parquet, avro &c tests which and then we run against object stores through the S3a, abfs and gcs stores. That way we can identify regressions fast and test against the development branches against live cloud infrastructure. There is also the option of a mini in-process HDFS cluster to test file R/W there... that can be done in parquet today.
w.r.t format testing, got some more thoughts there which would actually be
Someone still needs to provide keys for the target stores, so can't be run in the public CI tests...recurrent PITA there
Describe the enhancement requested
Remove Hadoop 2 support
There is fallback logic in case it needs to seek within a file.
Component(s)
No response