PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
15.71k stars 1.53k forks source link

S3 Storage should allow to pass boto3 extra client parameters #5658

Open davzucky opened 2 years ago

davzucky commented 2 years ago

Description

This is the same type of issue as in prefect-aws https://github.com/PrefectHQ/prefect-aws/issues/25. A lot of companies are using custom CA certificates or S3 clones like Minio. The class prefect.blocks.storage.S3StorageBlockneeds to support these extra parameters. This is more of an extension of the current implementation

zanieb commented 2 years ago

Note that the S3StorageBlock will move to the prefect-aws collection and a generic storage block using fsspec will be the primary one exposed in the core library. I presume the fsspec implementation takes the necessary config.

davzucky commented 2 years ago

@madkinsz, it makes sense to move the S3StorageBlock to prefect-aws indeed. Yes, fsspec supports these parameters as client_kwargs when you initialize the filesystem. Looking forward to be able to play with the next version

jacksund commented 2 years ago

thanks for adding this to the roadmap! I'm in the same spot as Minio users, where I'm using Digital Ocean and need to pass extra parameters to boto3.Session.client (specifically endpoint_url).

rmcsqrd commented 2 years ago

+1 for digital ocean support.

I ran into issues with the S3StorageBlock recently because DO uses an https:// protocol instead of s3:// scheme; in the RemoteFileSystem filesystem definition DO buckets get inferred to use a fsspec.implementations.http.HTTPFileSystem implementation instead of s3fs.core.S3FileSystem objects.

I ended up defining a custom collection but it relies on hackily prepending the basepath kwarg with s3:// when I invoke RemoteFileSystem. It would be nice to override the scheme inference part of RemoteFileSystem in addition to being able to control endpoint_url.

chris-aeviator commented 1 year ago

RemoteFileSystem also does not pick up via

{
…
  "client_kwargs": {
    "endpoint_url": "http://my-s3.com:9000"
  }
}

when deploying, boto complains about not being able to connecto

http://s3.my-region.amazonaws.com/prefect/s3block.py"

even though my config 100% follows the prefect readme.

AyrtonDare commented 1 year ago

Is this no longer planned to be implemented?