Closed archiecobbs closed 6 months ago
Should be fixed in 693e199.
Thank you, very nice. Will test!
Hi. I'm not trying to reopen this issue, just sharing that even with the update I'm seeing some retries that don't go through, first with a 500 error and then with 403 on the exponential backoff. The reason I can only guess. I should remember how to turn on debugging output :-)
Apr 22 10:27:55 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 30 seconds
Apr 22 10:28:25 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 60 seconds
Apr 22 10:28:25 nbdkit[409]: s3backer: rec'd 500 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:26 nbdkit[409]: s3backer: retrying query (attempt #2): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:26 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:27 nbdkit[409]: s3backer: retrying query (attempt #3): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:27 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:29 nbdkit[409]: s3backer: retrying query (attempt #4): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:29 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:33 nbdkit[409]: s3backer: retrying query (attempt #5): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:33 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:41 nbdkit[409]: s3backer: retrying query (attempt #6): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:42 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:56 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 90 seconds
Apr 22 10:28:58 nbdkit[409]: s3backer: retrying query (attempt #7): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:28:58 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:29:26 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 120 seconds
Apr 22 10:29:30 nbdkit[409]: s3backer: retrying query (attempt #8): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:29:30 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:29:56 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 150 seconds
Apr 22 10:30:26 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 180 seconds
Apr 22 10:30:34 nbdkit[409]: s3backer: retrying query (attempt #9): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:30:34 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:30:57 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 210 seconds
Apr 22 10:31:27 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 240 seconds
Apr 22 10:31:57 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 270 seconds
Apr 22 10:32:27 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 300 seconds
Apr 22 10:32:42 nbdkit[409]: s3backer: retrying query (attempt #10): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:32:42 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:32:57 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 330 seconds
Apr 22 10:33:28 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 360 seconds
Apr 22 10:33:58 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 390 seconds
Apr 22 10:34:28 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 420 seconds
Apr 22 10:34:58 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 450 seconds
Apr 22 10:35:28 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 480 seconds
Apr 22 10:35:59 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 510 seconds
Apr 22 10:36:29 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 540 seconds
Apr 22 10:36:58 nbdkit[409]: s3backer: retrying query (attempt #11): PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:36:59 nbdkit[409]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/0000663a
Apr 22 10:36:59 kernel: block nbd4: Possible stuck request 000000007d0d3067: control (write@428769280,16384B). Runtime 570 seconds
I think I caught it, the error is Signature does not match.
Apr 22 11:23:32 s3backer[429]: > PUT /prismatis/00007255 HTTP/2
Apr 22 11:23:32 s3backer[429]: Host: charm.s3.fr-par.scw.cloud
Apr 22 11:23:32 s3backer[429]: User-Agent: s3backer/2.1.1/
Apr 22 11:23:32 s3backer[429]: Accept: */*
Apr 22 11:23:32 s3backer[429]: Content-Type: application/x-s3backer-block
Apr 22 11:23:32 s3backer[429]: Content-MD5: uBPZpgS7tYdO6y9lf7SlHg==
Apr 22 11:23:32 s3backer[429]: x-amz-acl: private
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: x-amz-date: 20240422T092332Z
Apr 22 11:23:32 s3backer[429]: x-amz-content-sha256: d9b681cbd90b9ac1250c0aebf8192ff751d923553d65fe2d12b181078139cb24
Apr 22 11:23:32 s3backer[429]: Authorization: AWS4-HMAC-SHA256 Credential=SCWN8FP9E6FS0GWEREVW/20240422/fr-par/s3/aws4_request, SignedHeaders=content-md5;content-type;host;x-amz-acl;x-amz-content-sha256;x-amz-content-sha256;x-amz-content-sha256;x-amz-content-sha256;x-amz-content-sha>
Apr 22 11:23:32 s3backer[429]: Content-Length: 16384
Apr 22 11:23:32 s3backer[429]:
Apr 22 11:23:32 s3backer[429]: * upload completely sent off: 16384 bytes
Apr 22 11:23:32 s3backer[429]: * old SSL session ID is stale, removing
Apr 22 11:23:32 s3backer[429]: * old SSL session ID is stale, removing
Apr 22 11:23:32 s3backer[429]: < HTTP/2 403
Apr 22 11:23:32 s3backer[429]: < x-amz-id-2: tx1351303c076c41818a8db-0066262c94
Apr 22 11:23:32 s3backer[429]: < x-amz-request-id: tx1351303c076c41818a8db-0066262c94
Apr 22 11:23:32 s3backer[429]: < content-type: application/xml
Apr 22 11:23:32 s3backer[429]: < date: Mon, 22 Apr 2024 09:23:32 GMT
Apr 22 11:23:32 s3backer[429]: <
Apr 22 11:23:32 s3backer[429]: * Connection #12 to host charm.s3.fr-par.scw.cloud left intact
Apr 22 11:23:32 nbdkit[429]: s3backer: rec'd 403 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/00007255
Apr 22 11:23:32 nbdkit[429]: s3backer: HTTP 403 status response payload:
<?xml version='1.0' encoding='UTF-8'?>
<Error><Code>AccessDenied</Code><Message>Signature does not match.</Message><RequestId>tx1351303c076c41818a8db-0066262c94</RequestId></Error>
The original error is 500, as below. IIRC, these 500 are normal, and just need to be retried. Then all the retries fail with 403 as above
Apr 22 11:21:24 nbdkit[429]: s3backer: rec'd 500 response: PUT https://charm.s3.fr-par.scw.cloud/prismatis/00007255
Apr 22 11:21:24 nbdkit[429]: s3backer: HTTP 500 status response payload:
<?xml version='1.0' encoding='UTF-8'?>
<Error><Code>InternalError</Code><Message><html><h1>Internal Error</h1><p>The server has either erred or is incapable of performing the requested operation.</p></html></Message><RequestId>txac6bfd564dd0459c8af88-006626>
Apr 22 11:21:24 nbdkit[429]: s3backer: PUT https://charm.s3.fr-par.scw.cloud/prismatis/00007255
I rolled back to 2.1.1, and retries after http 500 are succeeding again
Thanks for testing so quickly. There was just a dumb bug, which should be fixed in 22c56eb.
Thank you! Now is better, as least some 500 errors retry successfully, I am continuing observations!
This is working well so far.
@archiecobbs Do you plan to make a new tag (v2.1.2)?
Also, not sure what's the best place to ask, but what do you think about the various nbdkit filters if used with s3backer? There are retry
, cache
, blocksize
and so on.
... also, the nbdclient's -connections
option, does it have to correspond to nbdkit's --threads
?
Hi @khumarahn,
Thanks for your helpful testing. I've released version 2.1.2.
Regardind NBD questions, I'm not really that familiar with NBD - only just enough to use the API :)
However, I wouldn't be surprised if there are some functions that would be equally if not better done at the NBD layer, e.g., caching, encryption, etc.
Thanks!
One more question :-) Looks important, but I am not 100% sure.
nbdkit plugin api allows a callback .block_size
: a plugin can report its block size, and nbd clients will use it to optimize requests. I can't find if that's implemented in s3backer though, I wonder if you gave it any thought
Looks like .block_size
was added in the past couple of years (commit).
Not sure how much difference it will make but it doesn't hurt to implement it.
See ef19209.
That was quick, thank you :-) I updated but did not test for speed. There's no obvious difference, but again I didn't test!
When an HTTP request is retried, s3backer does not change the request in any way.
This means that the
Date
header, and the authorization calculation which depends on it, get more and more out of sync with the current time as the request is repeatedly retried.Depending on
--initialRetryPause
and--maxRetryPause
, it's possible for this offset to exceed what the server allows, causing failures for that and all subsequent retries.Ideally, s3backer should recalculate anything depending on the current time each time a request is retried.