apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.76k stars 4.21k forks source link

[Task]: Upgrade gcs-connector libraries to 3.x #31896

Open clairemcginty opened 1 month ago

clairemcginty commented 1 month ago

What needs to happen?

com.google.cloud.bigdataoss:{gcsio, gcs-connector, util, util-hadoop} libraries have a new 3.x release that supports vectored IO APIs: https://github.com/GoogleCloudDataproc/hadoop-connectors/releases

However, there are some significant breaking changes that will impact Beam's GcsUtil class. Plus, 3.x drops Hadoop 2 support.

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

clairemcginty commented 1 month ago

gcs-connector 3.x drops Java 8 support, so we're blocked on Beam dropping Java 8 as well.