NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

Support custom s3 endpoint (minio, wasabi) #94

Closed mrhamburg closed 2 years ago

mrhamburg commented 2 years ago

Hi!

Going through the codebase (with the s3 regex for getting the endpoint and a tightly coupled integration with the aws sdk), I assume that there is no way to setup a custom s3 backend at the moment (such as minio or using wasabi).

If the above assumption is correct, hereby a proposal to add the ability to set a custom s3 backend.

VirrageS commented 2 years ago

Hey @mrhamburg, that's a good catch. Yes, we currently don't support other S3 backends. Looking briefly at code I think we could add configuration variable that could specify the endpoint for the S3 backend, instead of using default one. aws-go-sdk seems to support it with aws.Config.Endpoint so we would basically pass it there.

The biggest problem I see would be testing. We don't currently have an access to any other S3 backend than AWS/S3 itself. Do you have an idea on how could we test it?

mrhamburg commented 2 years ago

Hey @VirrageS,

You can test the integration similarly to how many other s3 compatible clients are being tested (such as s3cmd, or rclone), by spinning up a Minio docker container in your CI pipeline and use that as an endpoint. Similar approach to how you are testing azblob (sort off). If you at least cover Minio, it should also work with many other providers. Usually I see that either multipart uploading/downloading is not supported by some providers, but that is not needed for AIStore and sometimes the E-Tag hashing algorithm differs, but that in my experience is an edge case. Usually they follow the s3 specifications as other providers have an incentive to be fully compatible with what AWS is doing (for increased interoperability). In short, cover Minio and the rest is thereby implicitly tested or simply not supported.

I took the approach from minio docs to kick off this feature. I am in no means a Go programmer and if I were to create a PR for this it would be my first lines of Go ever. So go easy on me, but this is part of what I have in mind:

File: https://github.com/NVIDIA/aistore/blob/master/ais/backend/aws.go

// ok
if len(os.Getenv("AWS_ENDPOINT")) > 0 {
    awsConf.Region = aws.String(os.Getenv("AWS_DEFAULT_REGION"))
    awsConf.Endpoint = aws.String(os.Getenv("AWS_ENDPOINT"))
    awsConf.Credentials = credentials.NewStaticCredentials(os.Getenv("AWS_ACCESS_KEY_ID"), os.Getenv("AWS_SECRET_ACCESS_KEY"), "")
    awsConf.DisableSSL = aws.Bool(false)
    awsConf.S3ForcePathStyle = aws.Bool(false)

    svc = s3.New(session.New(awsConf))
} else {
    awsConf.Region = aws.String(region)
    svc = s3.New(_session(), awsConf)
    debug.Assertf(region == *svc.Config.Region, "%s != %s", region, *svc.Config.Region)
}

Enabling this will enable AIStore to be compatible with many other s3 enabled providers (Minio, Wasabi, Alibaba, OVH, Scaleway, Tencent, Qingcloud, Contabo, Vultr, DigitalOcean etc..).

VirrageS commented 2 years ago

Thanks! This is really helpful, we will look into adding this feature.

VirrageS commented 2 years ago

@mrhamburg The feature has been implemented in https://github.com/NVIDIA/aistore/commit/e8a5cfab7c84dd0675c5c48c6c41dbe055e9b13a. Let us know if you will have any more questions :)