Closed dionhaefner closed 6 years ago
What is this none
algorithm? It seems to be way better than everything else for speed, and for float compression it's still better than LZW.
Jokes aside, maybe we should add an option for turning off compression? I could see the increased disk use as an acceptable trade-off for speed (disk space is cheap).
Something the article doesn't consider is large patches of nodata. So in practice, the space savings are often much bigger than the given compression ratios. So I think some light compression should still be the default, but I agree that an option to turn it off shouldn't hurt.
Something the article doesn't consider is large patches of nodata. So in practice, the space savings are often much bigger than the given compression ratios. So I think some light compression should still be the default, but I agree that an option to turn it off shouldn't hurt.
Yes, having compression on is a sane default. But being able to turn it off seems like a potentially very big performance boost, given that reading in the tiles is the current performance bottleneck, AFAIK?
given that reading in the tiles is the current performance bottleneck
That's true, but I don't know if we are actually CPU bound or IO bound in real-world applications (probably both; I would expect Lambda deployments to be IO bound, and WSGI deployments to be CPU bound). If we are IO bound, compression should actually increase performance :)
I'll introduce a flag.
Can't use ZSTD compression yet, since the linked GDAL version via conda-forge
is too low. I'd expect it to be bumped in one of the upcoming rasterio releases.
For the record, we have been using DEFLATE, not LZW.
That's true, but I don't know if we are actually CPU bound or IO bound in real-world applications (probably both; I would expect Lambda deployments to be IO bound, and WSGI deployments to be CPU bound). If we are IO bound, compression should actually increase performance :)
Yes, he's doing the benchmark on an SSD machine, which is probably a crucial factor. I would expect non-SSD machines to be heavily I/O bound and so yes, I would then also expect compression to actually speed up the reading.
I looked at some other benchmarks of compression algorithms but it actually seems to be very difficult to do a useful benchmark. Decompression performance is of course highly dependent on the data, but apparently also on CPU architecture and not just CPU speed. so it seems that it's highly situation and machine dependent. But the order of magnitude appears to be 100MB/s to 2GB/s. Where a modern SSD has read speeds at around 650MB/s to 2.3GB/s. In any case, I think it will be impossible to set a default which will work equally well for all setups.
@j08lue found this interesting article benchmarking GDAL's compression algorithms. We are currently using LZW, which is horrible for floating data. We mostly care about read speed, then compression ratio, then write speed, so ZSTD looks like a better alternative.
Documentation of GDAL's compression options can be found here.