Genbox / SimpleS3

A .NET Core implementation of Amazon's S3 API with focus on simplicity, security and performance
MIT License
44 stars 8 forks source link

Presign multipart uploads #59

Closed holly-hacker closed 3 weeks ago

holly-hacker commented 1 month ago

Description of the feature

I need to allow clients of my application to upload large files to S3-compatible storage, which requires presigned multipart uploads. From what I can tell, no S3 libraries for .NET seem to support this right now. Is this something you plan on implementing? Or is it perhaps already possible, but am I just overlooking some API?

Genbox commented 4 weeks ago

There is only support for presigned GET/PUT/DELETE/HEAD at the moment. I can add a pre-signed multipart client.

Genbox commented 4 weeks ago

I've built it. I'll release it tomorrow after some more testing.

holly-hacker commented 3 weeks ago

Thanks a lot! I can't give much, but I've send a one-time sponsorship your way that matches the commercial tier for half a year. Thank you for maintaining these open-source libraries <3

Genbox commented 3 weeks ago

I've released version 3.1 which has a generic SignRequest method that can create a pre-signed URL for any request. You could just use a HttpClient to post data to the URL, but to support future ideas of client-side encryption, I've provided a simple API for using the pre-signed requests as well.

Here is how you use the API:

string key = "data.txt";

//Create a multipart upload
CreateMultipartUploadResponse createResp = await client.CreateMultipartUploadAsync(bucket, key);

List<S3PartInfo> infos = new List<S3PartInfo>();

await using (MemoryStream ms = new MemoryStream("hello world"u8.ToArray()))
{
    UploadPartRequest req = new UploadPartRequest(bucket, key, createResp.UploadId, 1, null); //We add null in place of content

    //This gives you a pre-signed URL
    string url = client.SignRequest(req, TimeSpan.FromSeconds(100));

    //For convenience, I've provided a SendSignedRequestAsync function. However, you could just use HttpClient.PutAsync()
    UploadPartResponse partResp = await client.SendSignedRequestAsync<UploadPartResponse>(url, HttpMethodType.PUT, ms);

    //We need to save information about the parts for the complete step further down
    infos.Add(new S3PartInfo(partResp.ETag, 1));
}

CompleteMultipartUploadResponse compResp = await client.CompleteMultipartUploadAsync(bucket, key, createResp.UploadId, infos);

Thank you very much for the sponsorship. It is much appreciated!

Genbox commented 3 weeks ago

Note: You can also pre-sign the CreateMultipartUploadRequest, but not a CompleteMultipartUploadRequest since it is using XML in it's request.

holly-hacker commented 3 weeks ago

I'm currently in the process of moving from minio-dotnet to SimpleS3.GenericS3 in my project, and I notice that the new SignRequest API will not add a http:// or https:// protocol to the signed request. Is that intended? I'm currently working around this like so:

var req = new PutObjectRequest(s3Service.Bucket, fileAttachment.ObjectStorageKey, null);
var url = s3Service.S3Client.SignRequest(req, TimeSpan.FromMinutes(5));

// TODO: check SimpleS3Config.UseTls instead
if (!url.StartsWith("http://") && !url.StartsWith("https://"))
    url = env.IsDevelopment() ? $"http://{url}" : $"https://{url}";

To be clear, I am not using multipart uploads yet. I'm only converting my existing signed PutObject code.


My local Minio instance is also giving me a "Forbidden" due to a signature failure, with the following response (formatted for readability). Possibly related to NamingMode.PathStyle?

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>SignatureDoesNotMatch</Code>
  <Message>
    The request signature we calculated does not match the signature you provided. Check your key and signing method.
  </Message>
  <Key>creator_profile/b96b2778-39b6-4706-b2ee-2fe8817f35cf/post/af6220e4-596a-4a34-8ab1-4911d072e10c/attachment/3453a3f0-9be7-4de1-96dc-92a87a762c1a_0c8ed5f8-9e24-43c5-8ae1-d21e1d432969/test_1.txt</Key>
  <BucketName>test-bucket</BucketName>
  <Resource>/test-bucket/creator_profile/b96b2778-39b6-4706-b2ee-2fe8817f35cf/post/af6220e4-596a-4a34-8ab1-4911d072e10c/attachment/3453a3f0-9be7-4de1-96dc-92a87a762c1a_0c8ed5f8-9e24-43c5-8ae1-d21e1d432969/test_1.txt</Resource>
  <RequestId>17EE27D171D64AD9</RequestId>
  <HostId>dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8</HostId>
</Error>

In case it is relevant, this is my setup:

<PackageReference Include="Genbox.SimpleS3.Core" Version="3.1.1" />
<PackageReference Include="Genbox.SimpleS3.GenericS3" Version="3.1.1" />
// a dummy instance of RegionData that returns a single region from config. required, causes DI failure otherwise
builder.Services.AddTransient<IRegionData, DummyRegionData>();

builder.Services.AddGenericS3(c =>
{
    var endpoint = ...;
    var accessKey = ...;
    var secretKey = ...;
    var region = ...;

    c.Endpoint = new Uri(endpoint);
    c.Credentials = new StringAccessKey(accessKey, secretKey);
    c.RegionCode = region;
    c.NamingMode = NamingMode.PathStyle;

    if (builder.Environment.IsDevelopment())
        c.UseTls = false;
});

And this is how I use the presigned URL (response from GetUploadUrlAsync):

public async Task UploadFile(Guid postAttachmentId, Stream stream, CancellationToken ct = default)
{
    var uploadUrl = await apiClient.GetUploadUrlAsync(postAttachmentId, ct);

    var response = await httpClient.PutAsync(uploadUrl, new StreamContent(stream), ct);

    if (!response.IsSuccessStatusCode)
    {
        var body = await response.Content.ReadAsStringAsync(ct);
        // TODO: maybe parse this and return it to the user
        Debug.WriteLine("Failed to upload file: " + body);

        throw new Exception($"Failed to upload file, status code {response.StatusCode}, content: {await response.Content.ReadAsStringAsync(ct)}");
    }

    await apiClient.MarkUploadedAsync(postAttachmentId, ct);
}
Genbox commented 3 weeks ago

Last time i checked, Minio did have a small difference in how they calculate the signature, which is incompatible with Amazon S3. I'll look into what is causing the issue.

As for the HTTP scheme issue, it is probably related to GenericS3 as all the unit tests (which covers B2, AWS, Wasabi, GCS) are passing. I'll look into that too.

Thanks for all the example code. Makes it much easier to replicate the issues!

Genbox commented 3 weeks ago

@holly-hacker you can follow the progress in each of the issues I've created separately.

I'm able to work with a local instance of Minio now. Code:

string accessKey = "8GCq0aMeFPyCPY9NXVhQ";
string secretKey = "EWpEoJokSnAFCWr5f1GgFSTPBinKGTSYGFHFEG9W";
string endpoint = "{Scheme}://127.0.0.1:9000/{Bucket}";
string region = "us-east-1";

GenericS3Config config = new GenericS3Config(accessKey, secretKey, endpoint, region);
config.PayloadSignatureMode = SignatureMode.FullSignature;
config.NamingMode = NamingMode.PathStyle;

GenericS3Client client = new GenericS3Client(config);
PutObjectResponse resp = await client.PutObjectStringAsync("mybucket", "hello.txt", "hello world");
Console.WriteLine(resp.IsSuccess);
Genbox commented 3 weeks ago

I also tested with presigned requests like so:

PutObjectRequest req = new PutObjectRequest("mybucket", "hello.txt", null);
string url = client.SignRequest(req, TimeSpan.FromMinutes(5));

using MemoryStream ms = new MemoryStream("hello world"u8.ToArray());
PutObjectResponse resp = await client.SendSignedRequestAsync<PutObjectResponse>(url, HttpMethodType.PUT, ms);
Console.WriteLine(resp.IsSuccess);
Genbox commented 3 weeks ago

I can't replicate the issue with RegionData. It used to be an issue in an older version of SimpleS3—perhaps you were running one of those versions?

There is a unit test here: https://github.com/Genbox/SimpleS3/blob/master/Src/SimpleS3.Extensions.GenericS3.Tests/GenericS3IssueTests.cs#L16

It succeeds without any problems. If you still have the problem, please create a new issue and I'll look into it.

Genbox commented 3 weeks ago

Quick note: You might want to use 3.2.1. I found a bug when writing the unit test linked above. I'll do my best to test GenericS3 going forward.

holly-hacker commented 3 weeks ago

Strange, I was using v3.1.1 and I still need to inject my own IRegionData on v3.2.1. I'm using AddGenericS3 from Genbox.SimpleS3.GenericS3.Extensions and looking through the code in Rider, I don't see any place where IRegionData is injected in that code path.

Full exception message: System.InvalidOperationException: Error while validating the service descriptor 'ServiceType: Genbox.SimpleS3.Core.Abstracts.Region.IRegionConverter Lifetime: Singleton ImplementationType: Genbox.SimpleS3.Core.Common.RegionConverter': Unable to resolve service for type 'Genbox.SimpleS3.Core.Abstracts.Region.IRegionData' while attempting to activate 'Genbox.SimpleS3.Core.Common.RegionConverter'.


I'm not sure how I feel about the changes to the Endpoint field, the current changes feel very backwards.

Every other S3 library I've tried allows for just setting endpoint, credentials, bucket, url style, region and whether to use https. I know for sure that this is what minio-dotnet uses because that's what I'm migrating from. Now it seems that I need to set a template for my endpoint, or else the other settings get ignored? I've never heard of object storage providers where a simple virtual host vs path style distinction wasn't enough and where a full-on template was required.

This is also a breaking change on a minor release. This will break people in a way that they cannot catch with a compiler error, and likely will only catch when they either run integration tests against their object storage, or when they deploy to production.

I really appreciate the work you're putting in this, but this seems to be going in the wrong direction and I'm more inclined to use v3.1.x and work around the bugs right now.

EDIT: if template functionality really is required, perhaps it's better to create a EndpointTemplate property that is mutually exclusive with NamingMode instead of with Endpoint. If EndpointTemplate is not filled in, a default template is selected based on NamingMode. If it is filled in, the NamingMode is disregarded and the chosen template is used. The template would then have a variable for the Endpoint property.

holly-hacker commented 3 weeks ago

I've created a very simple reproduction for the IRegionData issue here, but all it does is call builder.Services.AddGenericS3(c => {}). There's nothing more to it.

SimpleS3RegionDataRepro.zip

Genbox commented 3 weeks ago

I'm testing with GenericS3Client. The region data is likely only needed when not using DI. I've created an issue here: https://github.com/Genbox/SimpleS3/issues/62

Genbox commented 3 weeks ago

As for the template - it is very much needed depending on which S3 provider you use. Minio don't need it since they only support path-based buckets, but there is nothing to stop you from creating your own scalable infrastructure with DNS routing, and then you can no longer use Minio's client.

That being said, you don't have to use a template. It is only needed when using VirtualHost naming styles.

The endpoint from above ({Scheme}://127.0.0.1:9000/{Bucket}) could just simply be https://127.0.0.1:9000/ when using Minio, as long as you set NamingStyle to VirtualHost as Minio requires.