YoRyan / sia-slice

Maintain disk images or other large files indefinitely on the Sia network.
MIT License
24 stars 4 forks source link

100mb size decision is optimal? #2

Open dim-geo opened 4 years ago

dim-geo commented 4 years ago

Hi,

Can you please elaborate on 100MB size decision? Sia is using 40MB as a block.

Does LZ always compress chunks of data to sizes near but not higher than 40 MB? From your screenshot it seems that chunks can reach 50MB which is bad because it would consume sia storage and upload(?) of 80MB total.

Would it make sense to use a block size near 40MB?

In that case the blocks would be always close to the limit of 40MB without crossing it and you can estimate more accurately the space that sia-slice will consume. (at the risk of underusing that space)

YoRyan commented 4 years ago

The 100MB block size is by no means optimal. It's a balance between minimizing the number of files that need to be periodically reuploaded and maximizing the potential LZ compression. I chose a value that I thought was reasonable for multi-terrabyte targets: 10,000 files per 1TB. One must also, of course, mind the atomic Sia chunk size...

...which I had misinterpreted. The Sia docs imply the 40MB limit is a minimum file size and that as long as your files are larger than that, you'll be okay. But of course, you'll waste lots of space if your files are all 41MB, too! So for Sia Slice, I feel like an 80MB block size would be a sane default; just write off any losses to LZ compression.

If you want to tweak this yourself, there is currently no CLI switch, but the block size is easily accessible a constant at the top of siaslice.py. On subsequent syncs, Sia Slice will pick up the last block size used, even if it differs from this constant. But choose wisely, because there's no way to change the block size after the initial sync.

YoRyan commented 4 years ago

CLI switch is now implemented with 018a2f39c81e8096a9baec6faf1e8ad45fdf44f9.