archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
529 stars 75 forks source link

rec'd stale content when trimming zfs file system #135

Closed Kirin-kun closed 3 years ago

Kirin-kun commented 3 years ago

When executing a zpool trim on a pool created on a s3backer file, after a while, there are some errors appearing in the log:

rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce10b01441-0004d3ba

A few times, before "giving up".

The pool warns about (corrected) Write Errors in the status.

Why write errors, when the message was about a GET ?

I tried to lower the rate of the trim to 2048 byte/s on the command line and the error doesn't show up anymore, but then the trim becomes extremely slow, so basically useless. So I guess it's the speed of the trim process that triggers the issue.

The stats file looks typically like this. The "delete" appear only when trimming and when a delete occurs, that's when some "rec'd stale" appear (but it didn't appear 828 times, only a few times).

http_normal_blocks_read      8274
http_normal_blocks_written   140504
http_zero_blocks_read        0
http_zero_blocks_written     828
http_empty_blocks_read       108551
http_empty_blocks_written    64668
http_gets                    8323
http_puts                    140505
http_deletes                 828
http_avg_get_time            0.102 sec
http_avg_put_time            0.117 sec
http_avg_delete_time         0.023 sec
http_unauthorized            0
http_forbidden               0
http_stale                   27
http_verified                0
http_mismatch                0
http_5xx_error               0
http_4xx_error               0
http_other_error             0
http_canceled_writes         6125
http_num_retries             24
http_total_retry_delay       90.000 sec
curl_handle_reuse_ratio      0.9604
curl_timeouts                0
curl_connect_failed          0
curl_host_unknown            0
curl_out_of_memory           0
curl_other_error             0
block_cache_current_size     5000 blocks
block_cache_initial_size     0 blocks
block_cache_dirty_ratio      0.0044
block_cache_read_hits        58006
block_cache_read_misses      6648
block_cache_read_hit_ratio   0.8972
block_cache_write_hits       1135693
block_cache_write_misses     174430
block_cache_write_hit_ratio  0.8669
block_cache_verified         0
block_cache_mismatch         0
md5_cache_current_size       168 blocks
md5_cache_data_hits          0
md5_cache_full_delays        0.000 sec
md5_cache_write_delays       3746.162 sec
out_of_memory_errors         0

Is there a particular reason delete operations give stale errors on gets ?

As an aside, are http_canceled_writes something to be concerned about?

Any idea what value to give to the rate of trimming so it doesn't trigger this problem (if not really fixable) but is still reasonably fast, so I can reclaim space in the bucket ?

I understand that it means that the block read isn't what s3backer was expecting. Did it expect an empty/non existant block ? Because the blocks with the rec error are still there when I look in the bucket (and will stay there forever ?)

archiecobbs commented 3 years ago

The "stale content" error means we wrote something to S3, then tried to read it back, but what S3 told us was there had a different checksum from what we previously wrote.

The --minWriteDelay and --md5CacheTime settings can be adjusted to try to minimize this from occurring, if the problem is due to slow "eventual consistency" convergence.

In theory the problem could also be due to S3 not working properly (e.g., discarding a confirmed write) and then that block would be "stuck" in this state.

What settings are you using for the MD5 cache?

Why write errors, when the message was about a GET ?

It was reading a block for some reason. Possibly your s3backer block size is larger than your zfs block size, so writing a zfs block requires first reading the (larger) s3backer block, modifying part of it, and then writing it back.

What are those block sizes?

As an aside, are http_canceled_writes something to be concerned about?

The http_canceled_writes are not a problem, that just means we were sending S3 a write request from the block cache when the block changed (again) in the block cache, so we canceled the first write request and started a new one. Is your block cache configured for write-through or write-back? Probably want to have some delayed write-back to help avoid canceled writes.

It's not clear what exactly is happening, and it would require some detailed analysis to figure it out. It's good that you have a way to correct the problem. It's possible that increasing --minWriteDelay might fix it (this would be assuming S3 is being really slow to converge).

Kirin-kun commented 3 years ago

My command line is thus:

s3backer --directIO --accessEC2IAM=instance_role --blockSize=1M --size=500G --timeout=15 --prefix=tce --listBlocks --region=eu-west-1 --baseURL=https://s3.eu-west-1.amazonaws.com/ --blockHashPrefix --blockCacheSize=5000 --blockCacheFile=/mnt/data/s3cache --blockCacheRecoverDirtyBlocks --compress=9 bucketname /mnt/s3

So minWriteDelay and md5CacheTime must have their default value.

Without directIO, I get "stale" errors for the most basic operations, even just creating the filesystem. So the kernel cache is creating more problems than helping.

I choose a blocksize of 1M, because I wanted to avoid the frequent get/put of of small 4k blocks.

I set the ashift of the pool at 12 on creation, so the blocksize of the filesystem should be 4k. Does it make sense to have a bigger blocksize? Like ashift=13 (8k blocks). Or even ashift=16 (64k blocks).

In an ideal world, both zfs blocksize and s3backer blocksize would be equal, but the goal is to minimize the number of get/put.

Is s3backer PUTting blocks in batch when flushing the cache?

Anyway, zfs gives write errors, probably because it tried to rewrite a block of 4k (part of a 1M block in S3) and s3backer, GET'ing a 1M block sent back a read error. So, zfs wrote it elsewhere and considered the error corrected.

I'll try to set blockCacheWriteDelay to a higher value than the default to avoir canceled writes.

And also set minWriteDelay to something higher to try to avoid the stale gets.

The problem is easy to reproduce, so I'm going to do some tests with --minWriteDelay and --md5CacheTime :+1:

I think the trim is really demanding on the underlying "hardware", so when s3backer flushes the dirty blocks from the cache too often, it can create inconsistencies.

Maybe having a larger cache on local storage could be helpful for intensive i/o operations? And s3backer would write several dirty blocks in a single PUT transaction?

But then, a system crash could potentially utterly corrupt the file system.

I'll try tuning this.

Kirin-kun commented 3 years ago

I tried to play with the WriteDelay and md5cacheTime, but it seems to only delay the issue. After a while, a few stale errors begin to show up, which translate as "Write Errors" in the zpool status.

I suspect trimming the pool with a reduced rate will actually do the same (delay the stale error later). I launched a trim with a low byte rate on a test pool and will let it run over the week-end. I'll see how long it takes and if errors show up later.

It's odd that it only happens on trimming, when s3backer is supposed to delete blocks. I don't know what's inside the blocks it's trying to modify (and fails). metadata? When trimming, zfs sends BLKDISCARD, which I guess s3backer responds by zeroing part of its own block and rewriting it?

I tried smaller blocksize for s3backer to match the 4k of the filesystem and the number of errors grew exponentially, probably because it's writing a lot more often to s3.

I didn't see any corruption of the files yet, because zfs says it corrected the error (even though there's no redundancy).

archiecobbs commented 3 years ago

Is s3backer PUTting blocks in batch when flushing the cache?

Not sure what you mean... it writes each block as a separate request, but there can be many simultaneous writes going on at the same time.

Maybe having a larger cache on local storage could be helpful for intensive i/o operations? And s3backer would write several dirty blocks in a single PUT transaction?

Yes a larger cache (and a long write-back delay) always seems to help things because it absorbs a bunch of noise.

But then, a system crash could potentially utterly corrupt the file system.

In theory, only if your local disk is corrupted. s3backer tries to be careful to sync all local block cache writes to disk before confirming the write to the kernel, so a system crash that doesn't involve any other corruption should allow you to restart s3backer and have it pick up where it left off writing out dirty blocks. You need --blockCacheRecoverDirtyBlocks for this of course.

Here's more info on trimming and s3backer: https://github.com/archiecobbs/s3backer/wiki/Unused-Block-Deletion

This all makes me wonder if there is a bug in the MD5 cache where the MD5 delay is not being imposed when a write is canceled... this might be easy to detect by capturing the debug output. If you want to try it, this is most easily done by running s3backer in the foreground using -f --debug --debug-http. Configure a long MD5 delay, and verify that consecutive writes (or deletes) to the same block are spaced apart in time, whether canceled or not...

Another possible bug could be due to incorrect assumptions about canceled writes. That is, when you cancel a write you don't really know whether it completed or not.

I haven't worked on the code in a long time so the answers to these questions are not fresh in my mind...

Kirin-kun commented 3 years ago

I tried to trace without the debug-http (too verbose) and here is an error sequence during the trim.

Command line (with huge waits, cache and timeouts):

s3backer --directIO --accessEC2IAM=custom_instance_role_s3 --blockSize=1M --size=500G --prefix=tce --listBlocks --region=eu-west-1 --baseURL=https://s3.eu-west-1.amazonaws.com/ --blockHashPrefix --timeout=300 --minWriteDelay=5000 --blockCacheSize=5000 --blockCacheFile=/mnt/data/s3cache --blockCacheRecoverDirtyBlocks --initialRetryPause=10000 --maxRetryPause=50000 --md5CacheSize=20000000 --md5CacheTime=20000 bucket /mnt/s3

I greped in the log the block that gave errors, to see what was going on.

2020-10-26 11:18:07 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:18:09 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:19:02 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:19:02 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:19:33 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:19:33 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:19:38 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:19:52 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:21:15 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:21:15 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:11 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:13 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:28 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:28 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:38 INFO: retrying query (attempt #2): GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:38 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:58 INFO: retrying query (attempt #3): GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:22:58 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:23:18 INFO: retrying query (attempt #4): GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:23:18 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:23:18 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:23:18 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket
2020-10-26 11:23:18 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tcec448b72c-bucket

15 seconds after the PUT, there's a GET that give stale content for this block. That's a bit too large for an eventual consistency problem. And there's a GET success just after giving up (it discarded the md5 after giving up I guess, so of course, it's successful ?) The md5cache is far from full.

So, in these 15 seconds between the PUT and the GET, the expected content of the block in the md5cache or in S3, somehow has changed? It's not after a cancellation though.

There must be a race condition somewhere, but I'm not skilled enough in debug to know what's going on.

Anyway, this translates only as "Write Errors" in the pool:

zpool status
  pool: S3
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
config:

        NAME         STATE     READ WRITE CKSUM
        S3        ONLINE       0     0     0
          /dev/s3-0  ONLINE       0    11     0  (trimming)

errors: No known data errors

So, maybe the md5Cache content of these blocks is changing too fast between different threads for s3backer to follow ? I'm just doing wild guesses here.

Slowing down the trim seems to alleviate the problem somehow and write errors are not fatal (ZFS just writes the block elsewhere), but it's a bit worrying for the stability of the filesystem.

archiecobbs commented 3 years ago

Thanks for that debugging, it's very helpful. I don't see anything s3backer is doing that looks obviously wrong.

Can you run the same test again, after applying this patch? This will help figure out what's going on. Thanks!

diff --git a/http_io.c b/http_io.c
index 2de5007..92fdf14 100644
--- a/http_io.c
+++ b/http_io.c
@@ -1796,6 +1796,7 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur
 {
     struct http_io_conf *const config = priv->config;
     struct timespec delay;
+    char debug_extra[1024];
     CURLcode curl_code;
     u_int retry_pause = 0;
     u_int total_pause;
@@ -1803,13 +1804,42 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur
     double clen;
     int attempt;
     CURL *curl;
+    char *debug_req_end;
+
+    /* Extra debug */
+#define ADD_REQ_HEADER(name)                                                                                    \
+    {                                                                                                           \
+        const struct curl_slist *header;                                                                        \
+        for (header = io->headers; header != NULL; header = header->next) {                                     \
+            if (strncasecmp(header->data, name, strlen(name)) == 0) {                                           \
+                snprintf(debug_extra + strlen(debug_extra), sizeof(debug_extra) - strlen(debug_extra),          \
+                  "\n  REQ:%s", header->data);                                                                  \
+            }                                                                                                   \
+        }                                                                                                       \
+    }
+    debug_extra[0] = '\0';
+    ADD_REQ_HEADER(MD5_HEADER);
+    ADD_REQ_HEADER(ETAG_HEADER);
+    ADD_REQ_HEADER(HMAC_HEADER);
+    ADD_REQ_HEADER(IF_MATCH_HEADER);
+    ADD_REQ_HEADER(IF_NONE_MATCH_HEADER);

     /* Debug */
     if (config->debug)
-        (*config->log)(LOG_DEBUG, "%s %s", io->method, io->url);
+        (*config->log)(LOG_DEBUG, "%s %s%s", io->method, io->url, debug_extra);

     /* Make attempts */
+    debug_req_end = debug_extra + strlen(debug_extra);
     for (attempt = 0, total_pause = 0; 1; attempt++, total_pause += retry_pause) {
+        *debug_req_end = '\0';
+
+        /* Reset fields that are populated from response headers (in case this is not the first attempt) */
+        io->file_size = 0;
+        io->block_size = 0;
+        io->mount_token = 0;
+        memset(io->md5, 0, sizeof(io->md5));
+        memset(io->hmac, 0, sizeof(io->hmac));
+        memset(io->content_encoding, 0, sizeof(io->content_encoding));

         /* Acquire and initialize CURL instance */
         if ((curl = http_io_acquire_curl(priv, io)) == NULL)
@@ -1818,9 +1848,23 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur

         /* Perform HTTP operation and check result */
         if (attempt > 0)
-            (*config->log)(LOG_INFO, "retrying query (attempt #%d): %s %s", attempt + 1, io->method, io->url);
+            (*config->log)(LOG_INFO, "retrying query (attempt #%d): %s %s%s", attempt + 1, io->method, io->url, debug_extra);
         curl_code = curl_easy_perform(curl);

+        /* Add response header info to extra debug logging */
+        if (memcmp(io->md5, zero_md5, sizeof(io->md5)) != 0) {
+            char md5buf[MD5_DIGEST_LENGTH * 2 + 1];
+            http_io_prhex(md5buf, io->md5, MD5_DIGEST_LENGTH);
+            snprintf(debug_extra + strlen(debug_extra), sizeof(debug_extra) - strlen(debug_extra),
+              "\n RESP:%s: \"%s\"", ETAG_HEADER, md5buf);
+        }
+        if (memcmp(io->hmac, zero_hmac, sizeof(io->hmac)) != 0) {
+            char hmacbuf[SHA_DIGEST_LENGTH * 2 + 1];
+            http_io_prhex(hmacbuf, io->hmac, SHA_DIGEST_LENGTH);
+            snprintf(debug_extra + strlen(debug_extra), sizeof(debug_extra) - strlen(debug_extra),
+              "\n RESP:%s: \"%s\"", HMAC_HEADER, hmacbuf);
+        }
+
         /* Find out what the HTTP result code was (if any) */
         switch (curl_code) {
         case CURLE_HTTP_RETURNED_ERROR:
@@ -1850,7 +1894,7 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur

             /* Extra debug logging */
             if (config->debug)
-                (*config->log)(LOG_DEBUG, "success: %s %s", io->method, io->url);
+                (*config->log)(LOG_DEBUG, "success: %s %s%s", io->method, io->url, debug_extra);

             /* Extract timing info */
             if ((curl_code = curl_easy_getinfo(curl, CURLINFO_TOTAL_TIME, &curl_time)) != CURLE_OK) {
@@ -1897,13 +1941,13 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur
         switch (curl_code) {
         case CURLE_ABORTED_BY_CALLBACK:
             if (config->debug)
-                (*config->log)(LOG_DEBUG, "write aborted: %s %s", io->method, io->url);
+                (*config->log)(LOG_DEBUG, "write aborted: %s %s%s", io->method, io->url, debug_extra);
             pthread_mutex_lock(&priv->mutex);
             priv->stats.http_canceled_writes++;
             pthread_mutex_unlock(&priv->mutex);
             return ECONNABORTED;
         case CURLE_OPERATION_TIMEDOUT:
-            (*config->log)(LOG_NOTICE, "operation timeout: %s %s", io->method, io->url);
+            (*config->log)(LOG_NOTICE, "operation timeout: %s %s%s", io->method, io->url, debug_extra);
             pthread_mutex_lock(&priv->mutex);
             priv->stats.curl_timeouts++;
             pthread_mutex_unlock(&priv->mutex);
@@ -1912,22 +1956,22 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur
             switch (http_code) {
             case HTTP_NOT_FOUND:
                 if (config->debug)
-                    (*config->log)(LOG_DEBUG, "rec'd %ld response: %s %s", http_code, io->method, io->url);
+                    (*config->log)(LOG_DEBUG, "rec'd %ld response: %s %s%s", http_code, io->method, io->url, debug_extra);
                 return ENOENT;
             case HTTP_UNAUTHORIZED:
-                (*config->log)(LOG_ERR, "rec'd %ld response: %s %s", http_code, io->method, io->url);
+                (*config->log)(LOG_ERR, "rec'd %ld response: %s %s%s", http_code, io->method, io->url, debug_extra);
                 pthread_mutex_lock(&priv->mutex);
                 priv->stats.http_unauthorized++;
                 pthread_mutex_unlock(&priv->mutex);
                 return EACCES;
             case HTTP_FORBIDDEN:
-                (*config->log)(LOG_ERR, "rec'd %ld response: %s %s", http_code, io->method, io->url);
+                (*config->log)(LOG_ERR, "rec'd %ld response: %s %s%s", http_code, io->method, io->url, debug_extra);
                 pthread_mutex_lock(&priv->mutex);
                 priv->stats.http_forbidden++;
                 pthread_mutex_unlock(&priv->mutex);
                 return EPERM;
             case HTTP_PRECONDITION_FAILED:
-                (*config->log)(LOG_INFO, "rec'd stale content: %s %s", io->method, io->url);
+                (*config->log)(LOG_INFO, "rec'd stale content: %s %s%s", io->method, io->url, debug_extra);
                 pthread_mutex_lock(&priv->mutex);
                 priv->stats.http_stale++;
                 pthread_mutex_unlock(&priv->mutex);
@@ -1935,12 +1979,12 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur
             case HTTP_NOT_MODIFIED:
                 if (io->expect_304) {
                     if (config->debug)
-                        (*config->log)(LOG_DEBUG, "rec'd %ld response: %s %s", http_code, io->method, io->url);
+                        (*config->log)(LOG_DEBUG, "rec'd %ld response: %s %s%s", http_code, io->method, io->url, debug_extra);
                     return EEXIST;
                 }
                 /* FALLTHROUGH */
             default:
-                (*config->log)(LOG_ERR, "rec'd %ld response: %s %s", http_code, io->method, io->url);
+                (*config->log)(LOG_ERR, "rec'd %ld response: %s %s%s", http_code, io->method, io->url, debug_extra);
                 pthread_mutex_lock(&priv->mutex);
                 switch (http_code / 100) {
                 case 4:
@@ -1997,7 +2041,8 @@ http_io_perform_io(struct http_io_private *priv, struct http_io *io, http_io_cur
     }

     /* Give up */
-    (*config->log)(LOG_ERR, "giving up on: %s %s", io->method, io->url);
+    *debug_req_end = '\0';
+    (*config->log)(LOG_ERR, "giving up on: %s %s%s", io->method, io->url, debug_extra);
     return EIO;
 }
Kirin-kun commented 3 years ago

Here is the result. a "grep -A1" (the curl command + the following line) of the first block that gives an error.

Actually, it fails on the first "GET" ever, 15 seconds after the PUT. And requests (If-Match) a different md5 anyway ? And the one he gets after giving up is again different...

At the end, there's a PUT with a "REQ:Content-MD5: 8ukkg1ReTFq+w2/RS5gHdA==" (and success) but a GET nearly a minute later gives a different md5 as RESP too, but is a success. It was probably not in the cache anymore.

So I don't know where the problem could be. Either on the S3 end, or on the md5cache end. And it translates only a "Write Errors" that are corrected, probably by writing elsewhere.

I did a massive rsync of hundreds of Gbs to stress it a bit and didn't get any error.

Only the zpool trim without limiting the rate gives that kind of write errors.

2020-10-27 09:43:41 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JnxPpjf6ZHAMXg0qY9AhFQ==
--
2020-10-27 09:43:41 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JnxPpjf6ZHAMXg0qY9AhFQ==
--
2020-10-27 09:43:46 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: e0QFKjzDfRhZPIK9G39BLQ==
--
2020-10-27 09:43:46 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: e0QFKjzDfRhZPIK9G39BLQ==
--
2020-10-27 09:43:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: h0HitjvLHuWBxSPxzUP/LA==
--
2020-10-27 09:43:56 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: h0HitjvLHuWBxSPxzUP/LA==
--
2020-10-27 09:44:01 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: fY5LFYFzBmVf9zFCO8uphQ==
--
2020-10-27 09:44:01 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: fY5LFYFzBmVf9zFCO8uphQ==
--
2020-10-27 09:44:06 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YGP05dvE3BwKg90jvFE9qQ==
--
2020-10-27 09:44:06 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YGP05dvE3BwKg90jvFE9qQ==
--
2020-10-27 09:44:23 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: tstvPeLK+wzgmVb6wSXX9g==
--
2020-10-27 09:44:23 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: tstvPeLK+wzgmVb6wSXX9g==
--
2020-10-27 09:44:29 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: APCSiBwfsvnh1pg+hXalFA==
--
2020-10-27 09:44:29 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: APCSiBwfsvnh1pg+hXalFA==
--
2020-10-27 09:44:34 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: J0Eg2bcsfOVNCiI5EE/Iyg==
--
2020-10-27 09:44:34 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: J0Eg2bcsfOVNCiI5EE/Iyg==
--
2020-10-27 09:44:39 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JosPFnzn/f0+ast2PiEjdw==
--
2020-10-27 09:44:39 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JosPFnzn/f0+ast2PiEjdw==
--
2020-10-27 09:44:44 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Vww08Glv7AHAKrqM9Z3lOQ==
--
2020-10-27 09:44:44 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Vww08Glv7AHAKrqM9Z3lOQ==
--
2020-10-27 09:44:49 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UNFsBY8XsjB43zEiEbWUtA==
--
2020-10-27 09:44:49 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UNFsBY8XsjB43zEiEbWUtA==
--
2020-10-27 09:44:54 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: pc2M5d0RspCGskOE/NpKbQ==
--
2020-10-27 09:44:54 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: pc2M5d0RspCGskOE/NpKbQ==
--
2020-10-27 09:44:59 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WeyiPjg1dyXJCTrGn9RLOQ==
--
2020-10-27 09:45:00 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WeyiPjg1dyXJCTrGn9RLOQ==
--
2020-10-27 09:45:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Uv84pQUAyIB72sBZPb+AGA==
--
2020-10-27 09:45:05 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Uv84pQUAyIB72sBZPb+AGA==
--
2020-10-27 09:45:10 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 11iUq6B9haM/mPFIpGXubQ==
--
2020-10-27 09:45:10 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 11iUq6B9haM/mPFIpGXubQ==
--
2020-10-27 09:45:15 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: A8Xbasal0nimEfBG4i6cMA==
2020-10-27 09:45:15 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: A8Xbasal0nimEfBG4i6cMA==
--
2020-10-27 09:45:20 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: r3BP1uKziJn1KnAiUmc8Yw==
2020-10-27 09:45:20 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: r3BP1uKziJn1KnAiUmc8Yw==
--
2020-10-27 09:45:25 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: /tbtPQGgJNtalU9O0SOZGA==
--
2020-10-27 09:45:25 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: /tbtPQGgJNtalU9O0SOZGA==
--
2020-10-27 09:45:30 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: eC3WEi2+1dKLNFpeXjE1EA==
--
2020-10-27 09:45:30 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: eC3WEi2+1dKLNFpeXjE1EA==
--
2020-10-27 09:45:35 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: dCoUrs1qIlZNNGuB5yxfOw==
--
2020-10-27 09:45:35 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: dCoUrs1qIlZNNGuB5yxfOw==
--
2020-10-27 09:45:40 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: XHasoNy6MkqGg74yB6ptVQ==
--
2020-10-27 09:45:40 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: XHasoNy6MkqGg74yB6ptVQ==
--
2020-10-27 09:45:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kQV8TEvVbTdu/Ypfqvp67g==
--
2020-10-27 09:45:45 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kQV8TEvVbTdu/Ypfqvp67g==
--
2020-10-27 09:45:50 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: FtcgCHZOlOscJaEd81yIoA==
--
2020-10-27 09:45:50 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: FtcgCHZOlOscJaEd81yIoA==
--
2020-10-27 09:45:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 64Pqm/LM6B7BJDJoESJQjw==
--
2020-10-27 09:45:55 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 64Pqm/LM6B7BJDJoESJQjw==
--
2020-10-27 09:46:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cyyNA3zymbREhQQEQq0q4A==
--
2020-10-27 09:46:10 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cyyNA3zymbREhQQEQq0q4A==
--
2020-10-27 09:46:28 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: u+FxbIhQuQt+KXxLZxOM4Q==
--
2020-10-27 09:46:36 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: u+FxbIhQuQt+KXxLZxOM4Q==
--
2020-10-27 09:46:50 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:46:51 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
--
2020-10-27 09:47:01 INFO: retrying query (attempt #2): GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:01 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:21 INFO: retrying query (attempt #3): GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:21 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:41 INFO: retrying query (attempt #4): GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:41 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:41 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
--
2020-10-27 09:47:44 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
2020-10-27 09:47:44 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
 RESP:ETag: "302e943a37270eaf5e379356af47e4cb"
--
2020-10-27 09:47:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: zki3c6qvf1ht8pa2omj+JQ==
--
2020-10-27 09:47:45 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: zki3c6qvf1ht8pa2omj+JQ==
--
2020-10-27 09:47:50 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: LLFJge36YUlhtNsSawOqLA==
--
2020-10-27 09:47:50 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: LLFJge36YUlhtNsSawOqLA==
--
2020-10-27 09:47:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Ec+gu+6FcmLjyq5erj1Vpg==
--
2020-10-27 09:47:55 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Ec+gu+6FcmLjyq5erj1Vpg==
--
2020-10-27 09:48:00 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UtZn5OLzIxcfRPKLAlqvpA==
--
2020-10-27 09:48:00 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UtZn5OLzIxcfRPKLAlqvpA==
--
2020-10-27 09:48:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oy3oSXpN4xzL7O3iNEcX0g==
--
2020-10-27 09:48:05 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oy3oSXpN4xzL7O3iNEcX0g==
--
2020-10-27 09:48:10 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kki7u4c2T4iWmphEZtHfxA==
--
2020-10-27 09:48:10 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kki7u4c2T4iWmphEZtHfxA==
--
2020-10-27 09:48:15 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: wyVioL9qUsEbPS+3Lk9OdA==
2020-10-27 09:48:15 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: wyVioL9qUsEbPS+3Lk9OdA==
--
2020-10-27 09:48:20 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cu9WA4Q0m3vuJ60b+lymdA==
2020-10-27 09:48:20 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cu9WA4Q0m3vuJ60b+lymdA==
--
2020-10-27 09:48:25 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cBMmGBvabP+u8l+ttknPRQ==
--
2020-10-27 09:48:25 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cBMmGBvabP+u8l+ttknPRQ==
--
2020-10-27 09:48:30 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Sc47VTnweTttWQbjVUIe0g==
2020-10-27 09:48:30 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Sc47VTnweTttWQbjVUIe0g==
--
2020-10-27 09:48:35 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YthLeSFNYw/3Z36MjBOXIA==
2020-10-27 09:48:35 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YthLeSFNYw/3Z36MjBOXIA==
--
2020-10-27 09:48:40 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: mGtkaD3e2iyKPPPCfvZ9Uw==
2020-10-27 09:48:40 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: mGtkaD3e2iyKPPPCfvZ9Uw==
--
2020-10-27 09:48:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oleEV9iW9kD0QdK9A2WYbA==
2020-10-27 09:48:45 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oleEV9iW9kD0QdK9A2WYbA==
--
2020-10-27 09:48:50 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: hSZ761n8qFtmQRVEgf5XGg==
--
2020-10-27 09:48:50 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: hSZ761n8qFtmQRVEgf5XGg==
--
2020-10-27 09:48:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Jpf1QdhUXya6b8QrN3J+jQ==
--
2020-10-27 09:48:55 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Jpf1QdhUXya6b8QrN3J+jQ==
--
2020-10-27 09:49:00 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WaLCfJ/4gfZNWbWLFL7N9w==
--
2020-10-27 09:49:00 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WaLCfJ/4gfZNWbWLFL7N9w==
--
2020-10-27 09:49:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kBfM22ATCt9sLZtLiOLA4g==
--
2020-10-27 09:49:06 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kBfM22ATCt9sLZtLiOLA4g==
--
2020-10-27 09:49:11 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 8ukkg1ReTFq+w2/RS5gHdA==
--
2020-10-27 09:49:19 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 8ukkg1ReTFq+w2/RS5gHdA==
--
2020-10-27 09:50:06 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
2020-10-27 09:50:06 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
 RESP:ETag: "8a0c22b35a05435a207138641a31e11d"
--
2020-10-27 09:50:10 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: btOUGnUErDMpNTHoL1Z+iA==
--
2020-10-27 09:50:11 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: btOUGnUErDMpNTHoL1Z+iA==
archiecobbs commented 3 years ago

Thanks... we're getting somewhere I think...

First: I'm not sure grep -A1 is capturing all of the relevant output, in some cases, there may be more than one extra line logged so they might be getting chopped off (I can't tell for sure).

If your system can find and install logwarn then you can use it like this to pull out the relevant lines:

$ logwarn -f /dev/null -zp -m ^2020 mylogfile tce0a806557-00464013

Of course, you'll need to replace the mylogfile and tce0a806557-00464013 arguments.

Secondly: you can use the following tool to convert between Content-MD5 base 64 values and ETag hex values:

#!/bin/bash

ARG="$1"
if [[ "${ARG}" =~ ^[[:xdigit:]]{32}$ ]]; then
    echo "${ARG}" | xxd -r -p | openssl enc -base64
elif [[ "${ARG}" =~ ^[[:alnum:]/+]{22}==$ ]]; then
    echo "${ARG}" | openssl enc -d -base64 | hexdump -e '1/1 "%02x"' && echo 
else
    echo "Error: can't interpret \"${ARG}\"" 1>&2
    exit 1
fi

With the tool you can see that If-Match: "bbe1716c8850b90b7e297c4b67138ce1" and Content-MD5: u+FxbIhQuQt+KXxLZxOM4Q== are actually referring to the same value, so the If-Match expectation is correct.

Thirdly: here is what is weird so far: the GET that finally succeeds returns with ETag: "302e943a37270eaf5e379356af47e4cb", which would correspond to a Content-MD5 header of MC6UOjcnDq9eN5NWr0fkyw==, but none of the previous PUTs had that value.

That means this is starting to look more like a data corruption problem than an eventual consistency problem....

Kirin-kun commented 3 years ago

I'll do that tomorrow. I don't have access to my instance until then.

But I wanted to mention that zfs normally stores a checksum of a block data in the metadata, so, in another block than the data itself. If the data somehow becomes corrupted and doesn't match the checksum, it automatically tries to correct it with the raid's help. But here, that would be impossible, since there's no redundancy. And duplicating the data and the network traffic in s3 is nonsensical anyway.

So, if there was a corruption in the data it reads in the GET, it would error on it. But I have no read errors. Which means that the data it reads with the MC6UOjcnDq9eN5NWr0fkyw== md5 was what zfs was expecting. Or at least, the part that was interesting to zfs at this moment.

I wonder if it's related to the zfs trim operation. Because trimming is about discarding blocks, thus it would be useless to check the checksum. But the s3backer blocks are larger than the zfs blocks.

archiecobbs commented 3 years ago

So, if there was a corruption in the data it reads in the GET, it would error on it. But I have no read errors. Which means that the data it reads with the MC6UOjcnDq9eN5NWr0fkyw== md5 was what zfs was expecting. Or at least, the part that was interesting to zfs at this moment.

Right.. checksums are only checked on blocks ZFS cares about... blocks that it has discarded or deallocated it likely doesn't care.

When you get a chance to run this again, see if you can find the ETag value returned by the first successful GET somewhere earlier in the trace (you'll have to convert to base 64 first). I.e., I wonder where that value is coming from??

archiecobbs commented 3 years ago

Ah... are you using SSE-C or SSE-KMS for encryption?

I don't know what those acronyms represent but according to this they will cause the ETag header to be wrong...

Kirin-kun commented 3 years ago

Our buckets are automatically encrypted, at aws level, with the AWS-KMS key. It's an enforced policy by my organization. I don't have to specify it and I can't change it.

image

That would explain why the ETag doesn't seem to correspond to a previous PUT. It may actually correspond though?

I didn't have to give the --sse parameter for s3backer to work with it, as aws enforces the encryption automatically with the default aws:kms key if the header is not present.

I tried to increase drastically the md5cacheTime and sizes, as well as timeouts like this:

s3backer --directIO --accessEC2IAM=instance_role_s3 --blockSize=1M --size=500G --prefix=tce --listBlocks --region=eu-west-1 --baseURL=https://s3.eu-west-1.amazonaws.com/ --blockHashPrefix --timeout=300 --minWriteDelay=60000 --blockCacheSize=15000 --blockCacheFile=/mnt/data/s3cache --blockCacheRecoverDirtyBlocks --initialRetryPause=10000 --maxRetryPause=50000 --md5CacheTime=60000 --md5CacheSize=20971520 bucket /mnt/s3

And the problem arose later, and probably less. I let the trim run all night and I had 77 write errors and 2 read errors this morning. Though zfs corrected them all.

I checked with logwarn and everything I copy/pasted earlier was all there was:


./logwarn -f /dev/null -zp -m ^2020 /mnt/data/s3b.log tce0a806557-00464013
2020-10-27 09:43:41 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JnxPpjf6ZHAMXg0qY9AhFQ==
2020-10-27 09:43:41 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JnxPpjf6ZHAMXg0qY9AhFQ==
2020-10-27 09:43:46 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: e0QFKjzDfRhZPIK9G39BLQ==
2020-10-27 09:43:46 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: e0QFKjzDfRhZPIK9G39BLQ==
2020-10-27 09:43:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: h0HitjvLHuWBxSPxzUP/LA==
2020-10-27 09:43:56 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: h0HitjvLHuWBxSPxzUP/LA==
2020-10-27 09:44:01 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: fY5LFYFzBmVf9zFCO8uphQ==
2020-10-27 09:44:01 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: fY5LFYFzBmVf9zFCO8uphQ==
2020-10-27 09:44:06 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YGP05dvE3BwKg90jvFE9qQ==
2020-10-27 09:44:06 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YGP05dvE3BwKg90jvFE9qQ==
2020-10-27 09:44:23 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: tstvPeLK+wzgmVb6wSXX9g==
2020-10-27 09:44:23 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: tstvPeLK+wzgmVb6wSXX9g==
2020-10-27 09:44:29 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: APCSiBwfsvnh1pg+hXalFA==
2020-10-27 09:44:29 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: APCSiBwfsvnh1pg+hXalFA==
2020-10-27 09:44:34 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: J0Eg2bcsfOVNCiI5EE/Iyg==
2020-10-27 09:44:34 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: J0Eg2bcsfOVNCiI5EE/Iyg==
2020-10-27 09:44:39 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JosPFnzn/f0+ast2PiEjdw==
2020-10-27 09:44:39 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: JosPFnzn/f0+ast2PiEjdw==
2020-10-27 09:44:44 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Vww08Glv7AHAKrqM9Z3lOQ==
2020-10-27 09:44:44 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Vww08Glv7AHAKrqM9Z3lOQ==
2020-10-27 09:44:49 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UNFsBY8XsjB43zEiEbWUtA==
2020-10-27 09:44:49 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UNFsBY8XsjB43zEiEbWUtA==
2020-10-27 09:44:54 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: pc2M5d0RspCGskOE/NpKbQ==
2020-10-27 09:44:54 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: pc2M5d0RspCGskOE/NpKbQ==
2020-10-27 09:44:59 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WeyiPjg1dyXJCTrGn9RLOQ==
2020-10-27 09:45:00 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WeyiPjg1dyXJCTrGn9RLOQ==
2020-10-27 09:45:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Uv84pQUAyIB72sBZPb+AGA==
2020-10-27 09:45:05 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Uv84pQUAyIB72sBZPb+AGA==
2020-10-27 09:45:10 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 11iUq6B9haM/mPFIpGXubQ==
2020-10-27 09:45:10 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 11iUq6B9haM/mPFIpGXubQ==
2020-10-27 09:45:15 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: A8Xbasal0nimEfBG4i6cMA==
2020-10-27 09:45:15 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: A8Xbasal0nimEfBG4i6cMA==
2020-10-27 09:45:20 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: r3BP1uKziJn1KnAiUmc8Yw==
2020-10-27 09:45:20 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: r3BP1uKziJn1KnAiUmc8Yw==
2020-10-27 09:45:25 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: /tbtPQGgJNtalU9O0SOZGA==
2020-10-27 09:45:25 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: /tbtPQGgJNtalU9O0SOZGA==
2020-10-27 09:45:30 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: eC3WEi2+1dKLNFpeXjE1EA==
2020-10-27 09:45:30 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: eC3WEi2+1dKLNFpeXjE1EA==
2020-10-27 09:45:35 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: dCoUrs1qIlZNNGuB5yxfOw==
2020-10-27 09:45:35 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: dCoUrs1qIlZNNGuB5yxfOw==
2020-10-27 09:45:40 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: XHasoNy6MkqGg74yB6ptVQ==
2020-10-27 09:45:40 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: XHasoNy6MkqGg74yB6ptVQ==
2020-10-27 09:45:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kQV8TEvVbTdu/Ypfqvp67g==
2020-10-27 09:45:45 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kQV8TEvVbTdu/Ypfqvp67g==
2020-10-27 09:45:50 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: FtcgCHZOlOscJaEd81yIoA==
2020-10-27 09:45:50 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: FtcgCHZOlOscJaEd81yIoA==
2020-10-27 09:45:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 64Pqm/LM6B7BJDJoESJQjw==
2020-10-27 09:45:55 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 64Pqm/LM6B7BJDJoESJQjw==
2020-10-27 09:46:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cyyNA3zymbREhQQEQq0q4A==
2020-10-27 09:46:10 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cyyNA3zymbREhQQEQq0q4A==
2020-10-27 09:46:28 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: u+FxbIhQuQt+KXxLZxOM4Q==
2020-10-27 09:46:36 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: u+FxbIhQuQt+KXxLZxOM4Q==
2020-10-27 09:46:50 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:46:51 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:01 INFO: retrying query (attempt #2): GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:01 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:21 INFO: retrying query (attempt #3): GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:21 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:41 INFO: retrying query (attempt #4): GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:41 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:41 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:If-Match: "bbe1716c8850b90b7e297c4b67138ce1"
2020-10-27 09:47:44 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
2020-10-27 09:47:44 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
 RESP:ETag: "302e943a37270eaf5e379356af47e4cb"
2020-10-27 09:47:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: zki3c6qvf1ht8pa2omj+JQ==
2020-10-27 09:47:45 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: zki3c6qvf1ht8pa2omj+JQ==
2020-10-27 09:47:50 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: LLFJge36YUlhtNsSawOqLA==
2020-10-27 09:47:50 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: LLFJge36YUlhtNsSawOqLA==
2020-10-27 09:47:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Ec+gu+6FcmLjyq5erj1Vpg==
2020-10-27 09:47:55 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Ec+gu+6FcmLjyq5erj1Vpg==
2020-10-27 09:48:00 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UtZn5OLzIxcfRPKLAlqvpA==
2020-10-27 09:48:00 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: UtZn5OLzIxcfRPKLAlqvpA==
2020-10-27 09:48:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oy3oSXpN4xzL7O3iNEcX0g==
2020-10-27 09:48:05 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oy3oSXpN4xzL7O3iNEcX0g==
2020-10-27 09:48:10 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kki7u4c2T4iWmphEZtHfxA==
2020-10-27 09:48:10 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kki7u4c2T4iWmphEZtHfxA==
2020-10-27 09:48:15 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: wyVioL9qUsEbPS+3Lk9OdA==
2020-10-27 09:48:15 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: wyVioL9qUsEbPS+3Lk9OdA==
2020-10-27 09:48:20 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cu9WA4Q0m3vuJ60b+lymdA==
2020-10-27 09:48:20 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cu9WA4Q0m3vuJ60b+lymdA==
2020-10-27 09:48:25 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cBMmGBvabP+u8l+ttknPRQ==
2020-10-27 09:48:25 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: cBMmGBvabP+u8l+ttknPRQ==
2020-10-27 09:48:30 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Sc47VTnweTttWQbjVUIe0g==
2020-10-27 09:48:30 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Sc47VTnweTttWQbjVUIe0g==
2020-10-27 09:48:35 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YthLeSFNYw/3Z36MjBOXIA==
2020-10-27 09:48:35 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: YthLeSFNYw/3Z36MjBOXIA==
2020-10-27 09:48:40 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: mGtkaD3e2iyKPPPCfvZ9Uw==
2020-10-27 09:48:40 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: mGtkaD3e2iyKPPPCfvZ9Uw==
2020-10-27 09:48:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oleEV9iW9kD0QdK9A2WYbA==
2020-10-27 09:48:45 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: oleEV9iW9kD0QdK9A2WYbA==
2020-10-27 09:48:50 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: hSZ761n8qFtmQRVEgf5XGg==
2020-10-27 09:48:50 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: hSZ761n8qFtmQRVEgf5XGg==
2020-10-27 09:48:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Jpf1QdhUXya6b8QrN3J+jQ==
2020-10-27 09:48:55 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: Jpf1QdhUXya6b8QrN3J+jQ==
2020-10-27 09:49:00 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WaLCfJ/4gfZNWbWLFL7N9w==
2020-10-27 09:49:00 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: WaLCfJ/4gfZNWbWLFL7N9w==
2020-10-27 09:49:05 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kBfM22ATCt9sLZtLiOLA4g==
2020-10-27 09:49:06 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: kBfM22ATCt9sLZtLiOLA4g==
2020-10-27 09:49:11 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 8ukkg1ReTFq+w2/RS5gHdA==
2020-10-27 09:49:19 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: 8ukkg1ReTFq+w2/RS5gHdA==
2020-10-27 09:50:06 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
2020-10-27 09:50:06 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
 RESP:ETag: "8a0c22b35a05435a207138641a31e11d"
2020-10-27 09:50:10 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: btOUGnUErDMpNTHoL1Z+iA==
2020-10-27 09:50:11 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/tce0a806557-00464013
  REQ:Content-MD5: btOUGnUErDMpNTHoL1Z+iA==
archiecobbs commented 3 years ago

OK thanks, that explains everything.

Presumably the ETag that S3 is returning is for the encrypted content rather than the unencrypted content.

However, this is new, non-backward compatible behavior that Amazon has added since s3backer was originally written.

In this situation, s3backer is going to report errors, but they are really just phantom errors. Once the retries are exhausted and it reads the block again from scratch, the original, intact data will be returned.

I will have to look into how to properly handle this.

Kirin-kun commented 3 years ago

Since all the blocks are encrypted anyway, why does it return an error only for some of them and only when trimming?

When does s3backer compares the retrieved Etag with the md5 in the cache?

Kirin-kun commented 3 years ago

So, after thinking about it, it's s3backer's mechanism of protection against eventual consistency problems that triggers the errors?

It writes a block and on the next GET, since it still has the md5 in cache, it checks it to verify if the block is consistent. It doesn't match because of the encryption, so it errors.

And if it happens at all, it's because, when trimming, the blocks are written and read more often, so the md5cache still holds the md5 of these blocks.

That means that if there was no md5cache, there would be no errors at all?

That defeats the purpose of the cache.

archiecobbs commented 3 years ago

s3backer should be able to handle this I think. I'm still researching...

I am unable to reproduce the behavior where S3 returns an ETag different from the Content-MD5 value, even with AWS-KMS server-side encryption enabled.

In the meantime, if you have a chance, try applying this patch, reproducing the problem, and looking to see if there are any failed to parse errors logged.

diff --git a/http_io.c b/http_io.c
index 2de5007..b4e8912 100644
--- a/http_io.c
+++ b/http_io.c
@@ -2670,6 +2716,11 @@ http_io_curl_header(void *ptr, size_t size, size_t nmemb, void *stream)
 #endif
     if (http_io_parse_header(buf, ETAG_HEADER, "\"%32c\"", hashbuf) == 1)
         http_io_parse_hex(hashbuf, io->md5, MD5_DIGEST_LENGTH);
+    else {
+        const size_t hdrlen = strlen(ETAG_HEADER);
+        if (strncasecmp(buf, ETAG_HEADER, hdrlen) == 0 && buf[hdrlen] == ':')
+            (*io->config->log)(LOG_ERR, "%s %s: failed to parse: %s", io->method, io->url, buf);
+    }

     /* "x-amz-meta-s3backer-hmac" header */
 #if SHA_DIGEST_LENGTH != 20

I'm guessing you won't see any such errors logged, but would like to rule this out.

Thanks.

archiecobbs commented 3 years ago

Hang on, ignore previous comment. I think I see the problem.

Please try this patch:

diff --git a/http_io.c b/http_io.c
index 2de5007..039e495 100644
--- a/http_io.c
+++ b/http_io.c
@@ -1762,6 +1762,8 @@ http_io_write_prepper(CURL *curl, struct http_io *io)
     curl_easy_setopt(curl, CURLOPT_READDATA, io);
     curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, http_io_curl_reader);
     curl_easy_setopt(curl, CURLOPT_WRITEDATA, io);
+    curl_easy_setopt(curl, CURLOPT_HEADERFUNCTION, http_io_curl_header);
+    curl_easy_setopt(curl, CURLOPT_HEADERDATA, io);
     if (io->src != NULL) {
         curl_easy_setopt(curl, CURLOPT_UPLOAD, 1);
         curl_easy_setopt(curl, CURLOPT_INFILESIZE_LARGE, (curl_off_t)io->buf_size);
Kirin-kun commented 3 years ago

This patch didn't have any noticeable effect.

"Errors" are still there. After a few minutes, it errors and strangely, this time, always on the same block.

LD_LIBRARY_PATH=/usr/local/lib /root/s3backer/s3backer --directIO --accessEC2IAM=custom_instance_role_s3 --blockSize=1M --size=500G --prefix=sse --listBlocks --region=eu-west-1 --baseURL=https://s3.eu-west-1.amazonaws.com/ --blockHashPrefix --timeout=300 --minWriteDelay=5000 --blockCacheSize=5000 --blockCacheFile=/mnt/data/s3cacheb --blockCacheRecoverDirtyBlocks --initialRetryPause=10000 --maxRetryPause=50000 --md5CacheSize=20000000 --md5CacheTime=20000 --debug -f bucket /mnt/s3b >/mnt/data/s3c.log 2>&1

./logwarn -f /dev/null -zp -m ^2020 /mnt/data/s3c.log ssec83f4578-0001d004
2020-10-29 20:00:39 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: 6RX008kDOCgjdO5eVGPpPg==
2020-10-29 20:00:40 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: 6RX008kDOCgjdO5eVGPpPg==
 RESP:ETag: "645ca45caa057be1ea32637deedbbee9"
2020-10-29 20:00:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: svLKBrzVSB1Z6U9d5pCOSw==
2020-10-29 20:00:45 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: svLKBrzVSB1Z6U9d5pCOSw==
 RESP:ETag: "adc814d1858d4e35a312686619ec8079"
2020-10-29 20:00:55 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: 1IV6xV49lp9Y8LctDZjPQg==
2020-10-29 20:00:59 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: 1IV6xV49lp9Y8LctDZjPQg==
 RESP:ETag: "6e62abcbaaf8d2e44c3c85d660878a6e"
2020-10-29 20:01:57 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: SiXQkPBL4jJsllU0tCIB9g==
2020-10-29 20:01:57 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: SiXQkPBL4jJsllU0tCIB9g==
2020-10-29 20:02:02 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: w8U08pE8n0HPsqg73+DoiA==
2020-10-29 20:02:03 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: w8U08pE8n0HPsqg73+DoiA==
2020-10-29 20:02:08 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: MbhWgyNp9NgHRwlrNjSLgA==
2020-10-29 20:02:10 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: MbhWgyNp9NgHRwlrNjSLgA==
 RESP:ETag: "476cb6cea3242312630e8c60918c6fee"
2020-10-29 20:02:27 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:02:27 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:02:37 INFO: retrying query (attempt #2): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:02:37 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:02:57 INFO: retrying query (attempt #3): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:02:57 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:03:17 INFO: retrying query (attempt #4): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:03:18 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:03:18 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "31b856832369f4d80747096b36348b80"
2020-10-29 20:03:19 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
2020-10-29 20:03:19 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
 RESP:ETag: "476cb6cea3242312630e8c60918c6fee"
2020-10-29 20:03:19 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: CWy2Y+pUXhknaFjx1w9p1w==
2020-10-29 20:03:34 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: CWy2Y+pUXhknaFjx1w9p1w==
 RESP:ETag: "cb25a3b881d6de004ac6f7134c00a0da"
2020-10-29 20:04:22 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: AQk6AY8hJCYDBtAiFE2ePg==
2020-10-29 20:04:22 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: AQk6AY8hJCYDBtAiFE2ePg==
2020-10-29 20:04:27 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: YK8Uic+bJF/WutehxN6gFg==
2020-10-29 20:04:28 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: YK8Uic+bJF/WutehxN6gFg==
2020-10-29 20:04:33 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: TSo2d8ws78tIrfqr3vZmgg==
2020-10-29 20:04:37 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: TSo2d8ws78tIrfqr3vZmgg==
 RESP:ETag: "5cfc4cee8095a480e2ccc01b58e2f16c"
2020-10-29 20:04:50 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:04:50 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:00 INFO: retrying query (attempt #2): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:00 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:20 INFO: retrying query (attempt #3): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:20 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:40 INFO: retrying query (attempt #4): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:40 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:40 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "4d2a3677cc2cefcb48adfaabdef66682"
2020-10-29 20:05:43 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
2020-10-29 20:05:43 DEBUG: success: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
 RESP:ETag: "5cfc4cee8095a480e2ccc01b58e2f16c"
2020-10-29 20:05:43 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: 0zzXGEA88PmvuO4g88/k8Q==
2020-10-29 20:05:44 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: 0zzXGEA88PmvuO4g88/k8Q==
 RESP:ETag: "22e59fd5f1a6805f0be3b50fb9bd1276"
2020-10-29 20:06:45 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: w1Hb5js89ldHntUfyXqzsQ==
2020-10-29 20:06:46 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: w1Hb5js89ldHntUfyXqzsQ==
2020-10-29 20:06:51 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: gN2H08THN+Lcud0RGIZemQ==
2020-10-29 20:06:52 DEBUG: write aborted: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: gN2H08THN+Lcud0RGIZemQ==
2020-10-29 20:06:57 DEBUG: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: gF1KLbj9a/RAxmQa5oVANQ==
2020-10-29 20:06:58 DEBUG: success: PUT https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:Content-MD5: gF1KLbj9a/RAxmQa5oVANQ==
 RESP:ETag: "c0ed932d65ea356c6202895cf45d7c4e"
2020-10-29 20:07:16 DEBUG: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "805d4a2db8fd6bf440c6641ae6854035"
2020-10-29 20:07:16 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "805d4a2db8fd6bf440c6641ae6854035"
2020-10-29 20:07:26 INFO: retrying query (attempt #2): GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "805d4a2db8fd6bf440c6641ae6854035"
2020-10-29 20:07:26 INFO: rec'd stale content: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
  REQ:If-Match: "805d4a2db8fd6bf440c6641ae6854035"

I didn't copy/paste everything, but:

# grep ERROR /mnt/data/s3c.log

2020-10-29 20:03:18 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
2020-10-29 20:05:40 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
2020-10-29 20:08:06 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
2020-10-29 20:10:33 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004
2020-10-29 20:12:58 ERROR: giving up on: GET https://bucket.s3.eu-west-1.amazonaws.com/ssec83f4578-0001d004

In a few minutes, it errored 5 times on the same block. It looks like each time it tries to read that block after writing to it, the checksum isn't what it is expecting, for some reason.

I had smaller delay for this test, so the errors show up more often. It looks like a problem of timing, but I don't know where.

Normal operations, even a rsync or a "rm -rf" on the filesystem don't show any error.

Only zpool trim triggers it, as far as I tested.

archiecobbs commented 3 years ago

OK thanks! There is still more work to do but this confirms what is really happening.

It's happening on the same block probably because ZFS does the same sequence of reads & writes each time (I guess).

Kirin-kun commented 3 years ago

I compiled and ran the latest pull and so far, so good! :ok_hand:

It's been trimming and deleting blocks for 15 minutes and no error showed up, when yesterday there were some almost immediately.

Thanks for fixing this.

archiecobbs commented 3 years ago

Awesome!! I hadn't gotten around to asking you to try but thanks for going ahead and doing so.

The fix was easy once identified. Thanks for your testing which helped find it.

Kirin-kun commented 3 years ago

I will mention though that the trimming puts significant stress on the system, which is probably why these errors showed up so quick in the first place. But as my goal is to minimize the costs, I had to try.

When using zpool trim with its default rate, the system load is going through the roof and iostat on the disk that holds the cache file (/dev/xvdb) shows 99% busy time. And when sharing a zfs dataset with nfs, I got nfs I/O errors on the client side.

Once the trim canceled, everything was normal and nfs hummed happily.

So, even if trimming is useful to reclaim space, it should be used with caution on a s3backer filesystem.

archiecobbs commented 3 years ago

Interesting. s3backer's block cache was designed with the assumption that the time spent on local disk I/O would always be dwarfed by the time spent sending data over the network.

If a lot of the network activity is DELETE instead of PUT then maybe that's not necessarily true. Or it may be a wrong assumption in general these days that network is always slower than disk.

Does ZFS have a way to delete unused blocks on an "as you go" basis like ext4 does with the discard mount option (link)? Could better way to go.

Kirin-kun commented 3 years ago

Yes, ZFS indeed has this. You can set a pool parameter at autotrim=on and it will discard in background automatically. https://github.com/openzfs/zfs/commit/1b939560be5c51deecf875af9dada9d094633bf7

Since the automatic TRIM will skip ranges it considers too small
there is value in occasionally running a full `zpool trim`.

I don't know exactly why the trim is so demanding. It calculates, some way or other, what sectors are unallocated and sends a UNMAP/TRIM for those.

The thing is, since zfs blocks are smaller than s3backer blocks, it must create a lot of unnecessary traffic writing partially empty blocks back and forth, so I guess there would be value into aggregating the discards so a single delete can be issued and not rewriting a block several times until it's really empty.

I'm not a zfs expert, so I don't know if changing the recordsize (!= block size) and align it to the s3backer block size would help.

Kirin-kun commented 3 years ago

As for disks vs network, as you say, I'm not sure the g2 defaults disks are actually faster than the network to S3, even in read/write.

And if the block cache file is big, there may be an overhead when seeking/writing the blocks that becomes significant when there are more threads and I/Os.

I don't think the stats file monitors the io stats for the cache file.

archiecobbs commented 3 years ago

The thing is, since zfs blocks are smaller than s3backer blocks, it must create a lot of unnecessary traffic writing partially empty blocks back and forth, so I guess there would be value into aggregating the discards so a single delete can be issued and not rewriting a block several times until it's really empty.

Increasing --blockCacheWriteDelay may help this, depending on ZFS access patterns of course.