Closed frederickluser closed 1 week ago
Hi @frederickluser, that's an interesting question!
fst
uses LZ4 (highest speeds) and ZSTD (lowest speeds) for compression and decompression. In general, the size of your fst
file will be smallest for the highest compression settings.
Both compression algorithms will take more time for compression when the compression settings is higher but for decompression time there is almost no difference.
So if you want to write once and read often, your best option is to use the highest compression settings possible. With equal decompression time, the smaller number of bytes that need to be read from disk will shorten your reading times :-)
If you would have an infinitely fast disk the reading time would only be limited by decompression speed, and the actual level selected would probably not matter too much.
Hope that helps :-)
(PS: in the README benchmark figure you can also see that with the fast (but limited) disk speed there, more compression leads to higher reading speeds)
Hey Marcus
Great, thanks a lot for the super informative answer! That is every helpful.
All the best, Frederic
Thank you so much for all your great work. I wondered which compression factor would minimize reading time for large files with e.g. 100 million observations, if I'm not concerned about writing time. Do you have any intuition or previous benchmarks from, let's say extreme cases (e.g., compress = 0, 50, 100)?
EDIT: I guess optimal compression rates depend also on one's hardware. In my case at least, I work on a quite powerful machine, 36 virtual processors, 2.3GHz, 440 GB ...
Any comment highly appreciated. All the best, Frederic