apache / hop

Hop Orchestration Platform
https://hop.apache.org/
Apache License 2.0
940 stars 344 forks source link

[Feature Request]: upgrade AWS SDK for VFS S3 to v2 #2333

Open hansva opened 1 year ago

hansva commented 1 year ago

What would you like to happen?

Migration: https://issues.apache.org/jira/browse/HOP-4525

upgrade AWS VFS S3 to v2 

https://mvnrepository.com/artifact/software.amazon.awssdk/s3

Issue Priority

Priority: 2

Issue Component

Component: VFS

hansva commented 1 year ago

Additional requirements from: https://issues.apache.org/jira/browse/HOP-4452

MinIO is a free and opensource S3 compatible object store. It is easy to get going in docker containers and is very performant.  It would be wonderful if Hop could read and write to the object store.  In theory, this should be easy.  Most of the other software I see out there with S3 connections allow two connection variables to be set that point at Minio instead of Amazon S3, and another setting about how the path access style should be.  Please see below:

https://docs.dremio.com/software/data-sources/s3/#configuring-s3-for-minio

Dremio has this cool way of allowing access to S3-compatible object stores, like Minio by using two connection flags: fs.s3a.path.style.access = true  fs.s3a.endpoint = minio_server:9000

These appear to be settings that Hadoop jars are familiar with.  Are they supported in VFS in Hop in some way to allow it to read and write to MinIO but essentially speak "S3" to it?

Not just a dremio thing.  All kinds of hits like Spark with similar settings are there under "hadoopConfiguration.set(...": https://www.jitsejan.com/setting-up-spark-with-minio-as-object-storage

Pivotal Greenplum does the same here: https://gpdb.docs.pivotal.io/6-3/pxf/s3_objstore_cfg.html

hansva commented 1 year ago

Related ticket https://issues.apache.org/jira/browse/HOP-3474

hansva commented 1 year ago

Related ticket https://issues.apache.org/jira/browse/HOP-3417

avizingdbronson commented 1 year ago

As a developer, it would be great to have seamless credential handling from my workstation to our containers that use instance roles. AWS calls this Federated Roles as described here:

https://docs.aws.amazon.com/AmazonS3/latest/userguide/AuthUsingTempFederationToken.html

usbrandon commented 7 months ago

Related (duplicate) https://github.com/apache/hop/issues/3644