Closed HaleTom closed 2 years ago
Can you please confirm that once a --blockSize is set, it cannot be changed?
Correct.
Read amplification Write amplification
Can't really make any blanket statements because it all depends on lots of details (upper filesystem, kernel behavior, caching, etc). But in general (a) if the kernel reads and writes data in --blockSize
chunks that's more efficient, but also (b) there is some fixed per-block overhead when reading/writing to/from S3.
Am I right in thinking that this is a bitmap, with one bit per block (used / not used)?
Yes.
Writing each block has a certain amount of S3 API overhead. Larger blocks presumably reduce this overhead percentage, but that is not shown in https://github.com/archiecobbs/s3backer/issues/158 (see below).
Regarding issue #158, it is still unclear what is actually happening there.
Presumably, the cache should be large enough so that it can contain sufficient blocks to allow for the desired parallelism of blocks being written simultaneously.
Yes, that makes sense.
Cheers for the feedback!
I'll update this if/when I generate performance figures. For now the man
page suggests 1M
as an example block size.
A manual note on >16T devices that's not listed beside --blockSize
:
For cache space efficiency, s3backer uses 32 bit values to index individual blocks. Therefore, the block size must be increased beyond the default 4K when very large filesystems (greater than 16 terabytes) are created.
Glad I clocked that one as I'm going for 1P. Would you consider moving both (separated) BUGS
notes next to --blockSize
where most people will read them?
Oh, I just noticed your first BUGS
paragraph is duplicated... if only every bug was as easy to "remove" :)
Oh, I just noticed your first BUGS paragraph is duplicated
Not sure what you mean... what exactly are you seeing is duplicated?
Not sure what you mean... what exactly are you seeing is duplicated?
Sorry, PEBCAK: less
seek display glitch (or just double-sightedness?)
OK no problem. Closing this issue but feel free to reopen if you have more suggestions. Thanks.
What are the recommendations / heuristics for selecting the value for
--blockSize
?Immutability
--blockSize
is set, it cannot be changed?Read amplification
I'm guessing
blockSize
will need to be <= the size of the sector size of device or filesystem placed on top, so as to avoid reading a larger block to retrieve just one part of it (the sector).Write amplification
Same as read amplification, but there is extra overhead as the non-changed data also needs to be written back.
Used block bitmap
With
--listBlocks
, the load time will be longer if more blocks need to listed to represent the same data.Overheads
Writing each block has a certain amount of S3 API overhead. Larger blocks presumably reduce this overhead percentage, but that is not shown in #158 (see below).
Write speed and interactions with cache size
Presumably, the cache should be large enough so that it can contain sufficient blocks to allow for the desired parallelism of blocks being written simultaneously.
Prior art: