archiecobbs / s3backer

FUSE/NBD single file backing store via Amazon S3
Other
538 stars 77 forks source link

Retaining multiple versions of blocks when used with B2 #221

Closed jordemort closed 7 months ago

jordemort commented 7 months ago

Hi, I'm attempting to use s3backer with Backblaze B2's "S3 Compatible API" and ZFS. It mostly seems to work well, except that s3backer is creating and retaining multiple versions of each filesystem block in the B2 bucket. I have the bucket set to keep only the most recent version of any particular object, but on a freshly-created filesystem that I wrote 32 meg of random data to, some blocks have 10 or 11 versions in the bucket and the whole thing somehow takes up 90 meg of storage.

Here's my configuration file:

--accessFile=/etc/b2/auth
--baseURL=https://s3.us-west-001.backblazeb2.com/
--blockSize=64k
--blockCacheFile=/var/b2/cache
--blockCacheNumProtected=32
--blockCacheRecoverDirtyBlocks
--blockCacheSize=131072
--blockCacheWriteDelay=5000
--directIO
--listBlocks
--md5CacheSize=32768
--md5CacheTime=10000
--prefix=s3backer/
--size=2T

...and here's the systemd service that I'm running s3backer with:

[Unit]
Description=s3backer running in NBD mode
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/archiecobbs/s3backer

[Install]
WantedBy=multi-user.target

[Service]
Type=forking
ExecStart=/usr/bin/s3backer --nbd --configFile=/etc/b2/s3backer.conf shady-dearborn /dev/nbd0
Restart=on-failure

# Security hardening
ProtectSystem=full
#ProtectHome=read-only
ProtectHostname=true
ProtectClock=true
ProtectKernelTunables=true
ProtectKernelLogs=true
ProtectControlGroups=true
RestrictRealtime=true

I created my zpool like so:

zpool create -o ashift=16 -o cachefile=none -o autotrim=on -O atime=off -O compress=zstd -O acltype=posixacl -O xattr=sa -O devices=on b2pool /dev/nbd0

...and then my filesystem like this:

zfs create -o encryption=on -o keylocation=file:///var/b2/zfs/key -o keyformat=passphrase -o mountpoint=/b2 b2pool/data

I opened a ticket with Backblaze, but they said it must be something that s3backer is doing. However, I would not be surprised to learn that their "S3 Compatible API" isn't sufficiently compatible for s3backer. I couldn't find any examples of anyone else using s3backer with this service.

archiecobbs commented 7 months ago

I opened a ticket with Backblaze, but they said it must be something that s3backer is doing.

I'm afraid they may have made an improper logical leap.

s3backer does not include any flags or whatever in its request that says "retain multiple versions" (I'm not even sure such flags exist; if they do, I've never used them).

This must be something that is being done on the server side (i.e., Backblaze), whether they want to admit it or not. They are either (mis)configured to create multiple versions, or they have a bug in their code that's causing it.

If they do come up with something in s3backer's request that is somehow causing them to create multiple versions of a block I'd be interested to see it...

JeffByers commented 7 months ago

Backblaze and most other S3 services optionally support S3 object versioning, which is usually configured on the bucket. For any s3backer application, you would not want versioning to be enabled of course.

~ Jeff Byers ~

From: Archie L. Cobbs @.*** Sent: Thursday, April 04, 2024 11:19 AM To: archiecobbs/s3backer Cc: Subscribed Subject: Re: [archiecobbs/s3backer] Retaining multiple versions of blocks when used with B2 (Issue #221)

I opened a ticket with Backblaze, but they said it must be something that s3backer is doing.

I'm afraid they may have made an improper logical leap.

s3backer does not include any flags or whatever in its request that says "retain multiple versions" (I'm not even sure such flags exist; if they do, I've never used them).

This must be something that is being done on the server side (i.e., Backblaze), whether they want to admit it or not. They are either (mis)configured to create multiple versions, or they have a bug in their code that's causing it.

If they do come up with something in s3backer's request that is somehow causing them to create multiple versions of a block I'd be interested to see it...

— Reply to this email directly, view it on GitHub https://github.com/archiecobbs/s3backer/issues/221#issuecomment-2037887514 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYVRVYTXQQVZXYWTX7E55DY3WKRLAVCNFSM6AAAAABFXXT6L6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXHA4DONJRGQ . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/ADYVRV2B5F4BS6CLHVSILWDY3WKRLA5CNFSM6AAAAABFXXT6L6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTZO6ZBU.gif Message ID: @.***>

jordemort commented 7 months ago

Backblaze and most other S3 services optionally support S3 object versioning, which is usually configured on the bucket. For any s3backer application, you would not want versioning to be enabled of course.

Yes, as far as I can tell, I have object versioning disabled in Backblaze's interface, but it's acting like it's enabled when used with s3backer. FWIW, when I use B2 with s3ql, object versioning does seem to be disabled, but that's using the native B2 API rather than their S3-compatible one. Thanks for the help, I'll push back on them.

JeffByers commented 7 months ago

I’ve not used ZFS (recently), but it could be something that ZFS is doing tor/to you.

ZFS 101—Understanding ZFS storage and performance | Ars Technica https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/3/

There may be a delay before ZFS reclaims old blocks, or a command to run. This assumes that ZFS does not have any snapshots, which would require keeping the old blocks around.

~ Jeff Byers ~

From: Jordan Webb @.*** Sent: Thursday, April 04, 2024 12:21 PM To: archiecobbs/s3backer Cc: Jeff Byers; Comment Subject: Re: [archiecobbs/s3backer] Retaining multiple versions of blocks when used with B2 (Issue #221)

Backblaze and most other S3 services optionally support S3 object versioning, which is usually configured on the bucket. For any s3backer application, you would not want versioning to be enabled of course.

Yes, as far as I can tell, I have object versioning disabled in Backblaze's interface, but it's acting like it's enabled when used with s3backer. FWIW, when I use B2 with s3ql, object versioning does seem to be disabled, but that's using the native B2 API rather than their S3-compatible one. Thanks for the help, I'll push back on them.

— Reply to this email directly, view it on GitHub https://github.com/archiecobbs/s3backer/issues/221#issuecomment-2038038350 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ADYVRV7OPRAEADTUWHZR6BTY3WRX5AVCNFSM6AAAAABFXXT6L6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZYGAZTQMZVGA . You are receiving this because you commented. https://github.com/notifications/beacon/ADYVRV5VVNCC3BVJA2ZQPSTY3WRX5A5CNFSM6AAAAABFXXT6L6WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTZPH7U4.gif Message ID: @.***>

jordemort commented 7 months ago

I've been looking into B2's documentation on lifecycle rules: https://www.backblaze.com/docs/cloud-storage-lifecycle-rules

I'm not 100% certain, but based on my reading of it, it sounds like it may not be possible to retain data on B2 for any less than a day. Setting daysFromHidingToDeleting to null means "never delete anything" and it cannot be set to 0; if it has a non-null value, it must be a minimum of 1.

I'm not sure that this is going to end up being workable for me; I might have to try it out on Wasabi, or resort to actually using the real S3 :)

I am surprised at just how frequently particular blocks look like they're getting overwritten; looking at one of the blocks with 11 different versions, it looks like all 11 of them were written in less than a span of a minute. Is there a way to prevent blocks from being flushed out to S3 until they've been quiescent for a while? I was trying to achieve something like that with the md5 options but I don't think that's what that actually does.

archiecobbs commented 7 months ago

Is there a way to prevent blocks from being flushed out to S3 until they've been quiescent for a while?

The block cache should automatically do this (assuming it's big enough to not get blown out by your activity level).

I was trying to achieve something like that with the md5 options but I don't think that's what that actually does.

The MD5 options are called "cache" options but that's misleading because they don't cache data, they cache events.

archiecobbs commented 7 months ago

Closing this issue as not an s3backer issue. Feel free to add more comments if new information becomes available.