archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
535 stars 77 forks source link

Current time is not updated when a HTTP request is retried #214

Closed archiecobbs closed 6 months ago

archiecobbs commented 1 year ago

When an HTTP request is retried, s3backer does not change the request in any way.

This means that the Date header, and the authorization calculation which depends on it, get more and more out of sync with the current time as the request is repeatedly retried.

Depending on --initialRetryPause and --maxRetryPause, it's possible for this offset to exceed what the server allows, causing failures for that and all subsequent retries.

Ideally, s3backer should recalculate anything depending on the current time each time a request is retried.

archiecobbs commented 6 months ago

Should be fixed in 693e199.

khumarahn commented 6 months ago

Thank you, very nice. Will test!

khumarahn commented 6 months ago

Hi. I'm not trying to reopen this issue, just sharing that even with the update I'm seeing some retries that don't go through, first with a 500 error and then with 403 on the exponential backoff. The reason I can only guess. I should remember how to turn on debugging output :-)

Apr 22 10:27:55 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 30 seconds
Apr 22 10:28:25 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 60 seconds
Apr 22 10:28:25 nbdkit[409]: s3backer: rec'd 500 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:26 nbdkit[409]: s3backer: retrying query (attempt #2): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:26 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:27 nbdkit[409]: s3backer: retrying query (attempt #3): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:27 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:29 nbdkit[409]: s3backer: retrying query (attempt #4): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:29 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:33 nbdkit[409]: s3backer: retrying query (attempt #5): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:33 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:41 nbdkit[409]: s3backer: retrying query (attempt #6): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:42 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:56 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 90 seconds
Apr 22 10:28:58 nbdkit[409]: s3backer: retrying query (attempt #7): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:58 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:29:26 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 120 seconds
Apr 22 10:29:30 nbdkit[409]: s3backer: retrying query (attempt #8): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:29:30 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:29:56 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 150 seconds
Apr 22 10:30:26 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 180 seconds
Apr 22 10:30:34 nbdkit[409]: s3backer: retrying query (attempt #9): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:30:34 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:30:57 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 210 seconds
Apr 22 10:31:27 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 240 seconds
Apr 22 10:31:57 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 270 seconds
Apr 22 10:32:27 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 300 seconds
Apr 22 10:32:42 nbdkit[409]: s3backer: retrying query (attempt #10): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:32:42 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:32:57 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 330 seconds
Apr 22 10:33:28 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 360 seconds
Apr 22 10:33:58 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 390 seconds
Apr 22 10:34:28 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 420 seconds
Apr 22 10:34:58 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 450 seconds
Apr 22 10:35:28 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 480 seconds
Apr 22 10:35:59 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 510 seconds
Apr 22 10:36:29 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 540 seconds
Apr 22 10:36:58 nbdkit[409]: s3backer: retrying query (attempt #11): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:36:59 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:36:59 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 570 seconds
khumarahn commented 6 months ago

I think I caught it, the error is Signature does not match.

Apr 22 11:23:32 s3backer[429]: > PUT /prismatis/00007255 HTTP/2
Apr 22 11:23:32 s3backer[429]: Host: charm.s3.fr-par.scw.cloud
Apr 22 11:23:32 s3backer[429]: User-Agent: s3backer/2.1.1/
Apr 22 11:23:32 s3backer[429]: Accept: */*
Apr 22 11:23:32 s3backer[429]: Content-Type: application/x-s3backer-block
Apr 22 11:23:32 s3backer[429]: Content-MD5: uBPZpgS7tYdO6y9lf7SlHg==
Apr 22 11:23:32 s3backer[429]: x-amz-acl: private
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-date: 20240422T092332Z
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: Authorization: AWS4-HMAC-SHA256 Credential=SCWN8FP9E6FS0GWEREVW/20240422/fr-par/s3/aws4_request, SignedHeaders=content-md5;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-content-sha256;x-amz-content-sha256;x-amz-content-sha256;x-amz-content-sha>
Apr 22 11:23:32 s3backer[429]: Content-Length: 16384
Apr 22 11:23:32 s3backer[429]: 
Apr 22 11:23:32 s3backer[429]: * upload completely sent off: 16384 bytes
Apr 22 11:23:32 s3backer[429]: * old SSL session ID is stale, removing
Apr 22 11:23:32 s3backer[429]: * old SSL session ID is stale, removing
Apr 22 11:23:32 s3backer[429]: < HTTP/2 403
Apr 22 11:23:32 s3backer[429]: < x-amz-id-2: tx1351303c076c41818a8db-0066262c94
Apr 22 11:23:32 s3backer[429]: < x-amz-request-id: tx1351303c076c41818a8db-0066262c94
Apr 22 11:23:32 s3backer[429]: < content-type: application/xml
Apr 22 11:23:32 s3backer[429]: < date: Mon, 22 Apr 2024 09:23:32 GMT
Apr 22 11:23:32 s3backer[429]: <
Apr 22 11:23:32 s3backer[429]: * Connection #12 to host charm.s3.fr-par.scw.cloud left intact
Apr 22 11:23:32 nbdkit[429]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/00007255
Apr 22 11:23:32 nbdkit[429]: s3backer: HTTP 403 status response payload:
                             <?xml version='1.0' encoding='UTF-8'?>
                             <Error><Code>AccessDenied</Code><Message>Signature does not match.</Message><RequestId>tx1351303c076c41818a8db-0066262c94</RequestId></Error>
khumarahn commented 6 months ago

The original error is 500, as below. IIRC, these 500 are normal, and just need to be retried. Then all the retries fail with 403 as above

Apr 22 11:21:24 nbdkit[429]: s3backer: rec'd 500 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/00007255
Apr 22 11:21:24 nbdkit[429]: s3backer: HTTP 500 status response payload:
                             <?xml version='1.0' encoding='UTF-8'?>
                             <Error><Code>InternalError</Code><Message>&lt;html&gt;&lt;h1&gt;Internal Error&lt;/h1&gt;&lt;p&gt;The server has either erred or is incapable of performing the requested operation.&lt;/p&gt;&lt;/html&gt;</Message><RequestId>txac6bfd564dd0459c8af88-006626>
Apr 22 11:21:24 nbdkit[429]: s3backer: PUT https://charm.s3.fr-par.scw.cloud/prismatis/00007255
khumarahn commented 6 months ago

I rolled back to 2.1.1, and retries after http 500 are succeeding again

archiecobbs commented 6 months ago

Thanks for testing so quickly. There was just a dumb bug, which should be fixed in 22c56eb.

khumarahn commented 6 months ago

Thank you! Now is better, as least some 500 errors retry successfully, I am continuing observations!

khumarahn commented 6 months ago

This is working well so far.

@archiecobbs Do you plan to make a new tag (v2.1.2)?

Also, not sure what's the best place to ask, but what do you think about the various nbdkit filters if used with s3backer? There are retry, cache, blocksize and so on.

khumarahn commented 6 months ago

... also, the nbdclient's -connections option, does it have to correspond to nbdkit's --threads?

archiecobbs commented 6 months ago

Hi @khumarahn,

Thanks for your helpful testing. I've released version 2.1.2.

Regardind NBD questions, I'm not really that familiar with NBD - only just enough to use the API :)

However, I wouldn't be surprised if there are some functions that would be equally if not better done at the NBD layer, e.g., caching, encryption, etc.

khumarahn commented 6 months ago

Thanks!

khumarahn commented 6 months ago

One more question :-) Looks important, but I am not 100% sure.

nbdkit plugin api allows a callback .block_size: a plugin can report its block size, and nbd clients will use it to optimize requests. I can't find if that's implemented in s3backer though, I wonder if you gave it any thought

archiecobbs commented 6 months ago

Looks like .block_size was added in the past couple of years (commit).

Not sure how much difference it will make but it doesn't hurt to implement it.

See ef19209.

khumarahn commented 6 months ago

That was quick, thank you :-) I updated but did not test for speed. There's no obvious difference, but again I didn't test!