gaul / s3proxy

Access other storage backends via the S3 API
Apache License 2.0
1.75k stars 228 forks source link

Multi-Part upload to Azure Blob causes growing files #468

Open BasJ93 opened 1 year ago

BasJ93 commented 1 year ago

We've deployed S3Proxy to an AKS cluster to be the proxy in front of Azure Storage. Half the blobs we upload are below the default 4MB limit for multi-part uploads to Azure, the other blobs start at 25MB.

These 25MB files are causing us some problems, namely when we push these through the proxy they increase in size. It appears the data is appended to the already stored blob instead of the blob being replaced.

We are currently working around this problem by first sending a delete command, but would prefer that the functionality works as expected.

When we test the same application code again Min.IO we do not observe this growing file size problem, nor when we directly use the Azure API.

Perhaps we missed some configuration option? If not, have we discovered a bug?

gaul commented 1 year ago

Can you give more specific steps to reproduce your symptoms ad the expected behavior? I don't understand what is happening.

BasJ93 commented 1 year ago

@gaul Of course, please see if this helps.

Expected behavior

Observed behavior

Configuration

We've deployed s3proxy to an AKS cluster in this deployment:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: s3proxy
  namespace: s3proxy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: s3proxy
  template:
    metadata:
      labels:
        app: s3proxy
    spec:
      containers:
      - name: s3proxy
        image: andrewgaul/s3proxy:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 80
        env:
        - name: LOG_LEVEL
          value: trace
        - name: JCLOUDS_PROVIDER
          value: azureblob
        - name: JCLOUDS_IDENTITY
          valueFrom:
            secretKeyRef:
              name: azure-credentials
              key: accesskey
        - name: JCLOUDS_CREDENTIAL
          valueFrom:
            secretKeyRef:
              name: azure-credentials
              key: secretkey
        - name: S3PROXY_IDENTITY
          valueFrom:
            secretKeyRef:
              name: proxy-credentials
              key: accesskey
        - name: S3PROXY_CREDENTIAL
          valueFrom:
            secretKeyRef:
              name: proxy-credentials
              key: secretkey

Steps to reproduce

To reproduce the effect, I've taken the mc client (which we also use to access s3proxy) and copied it to the storage. The first copy works as expected: I upload 24.11 MiB and then download 24.11MiB.

When I then copy the exact same file to the exact same location, the file doubles in size: I upload 24.11MiB but suddenly download 48.21MiB. See the output below.

root@my-shell:/# ./mc cp /home/mc s3proxy/test
/home/mc:      24.11 MiB / 24.11 MiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.58 MiB/s 15s
root@my-shell:/# ./mc cp s3proxy/test/mc /home/mc2
...proxy/test/mc: 24.11 MiB / 24.11 MiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.15 MiB/s 21s
root@my-shell:/# ./mc cp /home/mc s3proxy/test
/home/mc:         24.11 MiB / 24.11 MiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.62 MiB/s 14s
root@my-shell:/# ./mc cp s3proxy/test/mc /home/mc3
...proxy/test/mc: 48.21 MiB / 48.21 MiB ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.15 MiB/s 41s

So, steps to reproduce:

Additional remarks

We do not observe this effect when using MinIO without s3proxy of when directly uploading files to Azure storage without s3proxy.

BasJ93 commented 1 year ago

@gaul, were you able to reproduce this issue?

gaul commented 1 week ago

Could you try testing with the new azureblob-sdk provider from #606?

gaul commented 1 week ago

Based on implementing the azureblob-sdk provider I suspect that the jclouds-based provider should set an overwrite=true flag somewhere.