databacker / mysql-backup

image to enable automated backups of mysql databases in containers
636 stars 178 forks source link

Error when using S3 compatible storage endpoint (Cloudflare's R2) #331

Closed ekadin-mtc closed 1 month ago

ekadin-mtc commented 1 month ago

I was using the latest tag without issue to backup a mysql (mariadb) database to R2 storage. I tried switching to 1.0.0-rc4 or master and it stopped working. It seems to be looking on AWS for the storage endpoint.

Here is the job config as a Kubernetes CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: mysqldump
spec:
  schedule: '45 12 * * *'
  concurrencyPolicy: Forbid
  timeZone: America/New_York
  successfulJobsHistoryLimit: 2
  failedJobsHistoryLimit: 2
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mysqldump
            image: databack/mysql-backup:1.0.0-rc4
            args:
             - dump
            env:
              - name: RUN_ONCE
                value: "true"
              - name: DB_DUMP_TARGET
                value: s3://*********/db-backup
              - name: AWS_ENDPOINT_URL
                value: https://***************************.r2.cloudflarestorage.com
              - name: DB_SERVER
                value: mariadb-062323
              - name: DB_PORT
                value: '3306'
              - name: DB_USER
                value: root
              - name: DB_PASS
                value: **************************
              - name: AWS_ACCESS_KEY_ID
                value: **************************
              - name: AWS_SECRET_ACCESS_KEY
                value: *****************************************
              - name: AWS_DEFAULT_REGION
                value: auto
          restartPolicy: OnFailure

Here are the container logs:

time="2024-07-23T14:25:20Z" level=info msg="beginning dump 2024-07-23T14:25:20Z"
Error: error running command: error running command: error running dump: failed to push file: failed to upload file, operation error S3: CreateMultipartUpload, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Post "https://**********.s3.auto.amazonaws.com//db-backup/db_backup_2024-07-23T14%3A25%3A20Z.tgz?uploads=&x-id=CreateMultipartUpload": dial tcp: lookup ***********.s3.auto.amazonaws.com on 10.152.183.10:53: no such host
time="2024-07-23T14:25:27Z" level=fatal msg="error running command: error running command: error running dump: failed to push file: failed to upload file, operation error S3: CreateMultipartUpload, https response error StatusCode: 0, RequestID: , HostID: , request send failed, Post \"https://***********.s3.auto.amazonaws.com//db-backup/db_backup_2024-07-23T14%3A25%3A20Z.tgz?uploads=&x-id=CreateMultipartUpload\": dial tcp: lookup *************.s3.auto.amazonaws.com on 10.152.183.10:53: no such host"
deitch commented 1 month ago

I reformatted the issue for easier reading.

deitch commented 1 month ago

This is helpful. Will look and figure this out. Thanks for the issue.

deitch commented 1 month ago

The integration test uses a local s3 server (fakes3) and it works. So clearly it is capable of sending to other endpoints. It appears that some combination of flags/env vars is triggering a bug in how it processes them. Let's see if we can narrow it down.

deitch commented 1 month ago

@ekadin-mtc I suspect that it has to do with AWS_REGION (or AWS_DEFAULT_REGION) being set to auto. Yes, of course, it should work. I would like to isolate if that is the issue. Can you change the region and see what happens?

ekadin-mtc commented 1 month ago

@deitch I set the region to us-east-1 and this time it complained about the Access Key ID, which I guess makes sense if it didn't try to first resolve the url.

time="2024-07-25T18:22:01Z" level=info msg="beginning dump 2024-07-25T18:22:01Z" Error: error running command: error running command: error running dump: failed to push file: failed to upload file, operation error S3: CreateMultipartUpload, https response error StatusCode: 403, RequestID: 62GA6SS4DC42WJJ3, HostID: 3xxxxxxxxxxxxxxxxxxxxxxxxxxxxxvN9ILAaBVklO7Hw=, api error InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records. time="2024-07-25T18:22:06Z" level=fatal msg="error running command: error running command: error running dump: failed to push file: failed to upload file, operation error S3: CreateMultipartUpload, https response error StatusCode: 403, RequestID: 62GA6SS4DC42WJJ3, HostID: 3xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxN9ILAaBVklO7Hw=, api error InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records."

deitch commented 1 month ago

A few notes:

At this point, I would like to know what the right behaviour that works would be. If we can determine the right combinations (and the wrong ones), and we can document them, that would be pretty good.

deitch commented 1 month ago

@ekadin-mtc can you take another look? I just bumped the aws-sdk-go-v2 version by quite a few. I then tested both with auto and us-east-1 and I see the same output. I also ran tcpdump at the same time to look for traffic to my minio server running on localhost:9000. I got the same traffic in both cases.

Give it a shot and post here?

ekadin-mtc commented 1 month ago

@deitch That seems to have done the trick! I used the databack/mysql-backup:master and set AWS_REGION to auto and it worked perfectly! Thanks!

A couple of things I noticed though:

  1. The DB_DUMP_TARGET is adding a forward slash along the way somewhere. I have it set to s3://containerName/db-backup and you can see in this screenshot the way is used to be and the way it is now. image

  2. Just a note for others running as a job: it seems the env variable used to be RUN_ONCE and it is now DB_DUMP_ONCE (was trying to figure out why my job wasn't completing)

deitch commented 1 month ago

Always is nice when the issue isn't in your own code, but an imported library, and a quick update fixes it.

The DB_DUMP_TARGET is adding a forward slash along the way somewhere. I have it set to s3://containerName/db-backup and you can see in this screenshot the way is used to be and the way it is now.

I am going to close this issue, so can you open a new one with it? I think your screenshot is from your s3 bucket, but not sure. Either way, explain it in a new issue please. Better tracking that way.

Just a note for others running as a job: it seems the env variable used to be RUN_ONCE and it is now DB_DUMP_ONCE (was trying to figure out why my job wasn't completing

Yes, had to do some cleanup for v1.0.0. Because it no longer necessarily runs in a container - standalone binary is first-class, along with container - no longer can assume full env var control. RUN_ONCE could be a lot of things. There was a case where it picked up something else unexpectedly. Took advantage of semver's "major version change doesn't have to be compatible with previous", let alone if previous was pre-v1.0.0, to change some. These are all in the configuration.md. Might not be a bad idea to create a migration doc, though.