gchq / sleeper

A cloud-native, serverless, scalable, cheap key-value store
Apache License 2.0
62 stars 9 forks source link

Standardise on using AWS SDK version 2 everywhere #1389

Open gaffer01 opened 12 months ago

gaffer01 commented 12 months ago

Background

The codebase currently uses a mixture of v1 and v2 of the AWS SDK.

Description

We would like to use v2 of the AWS SDK everywhere and stop using v1.

v1 of the Java SDK is end-of-life on 31st December 2025: https://aws.amazon.com/blogs/developer/announcing-end-of-support-for-aws-sdk-for-java-v1-x-on-december-31-2025/

Note that the S3A library that we use has already upgraded to v2 (https://issues.apache.org/jira/browse/HADOOP-18073).

This epic may also be a good time to review whether we can upgrade to Java 17 everywhere, although this would require EMR to support Java 17 fully:

We've split out some issues for the AWS SDK upgrade:

patchwork01 commented 2 days ago

This is on hold because AWS EMR still uses a version of Hadoop that uses AWS SDK v1, and if we use AWS SDK v2 for interacting with S3 outside of Hadoop, the jar for the bulk import starter lambda becomes too big to fit in the lambda.

Hadoop 3.4 uses AWS SDK v2, and the jar is likely to be smaller if we don't have both versions of the SDK on the classpath. AWS EMR is due for a minor release at the end of October, and this may include a Hadoop upgrade.

We can revisit this once AWS EMR uses a version of Hadoop that uses AWS SDK v2.