Open p2004a opened 8 months ago
Perhaps some sort of "clear pool and consolidate into a single .sdz
archive" option on top of existing rapid? BAR wants incremental because it has no release cycle, but maybe this won't be a worry longterm because it should eventually have one. And then a release is a perfect moment to consolidate the rapid pool.
ZK does something equivalent to the above - when there's a release we build an .sdz
archive via infra and distribute it via Steam, but this wouldn't necessarily be needed if the archive could be built locally by clients from pool.
Interesting idea, such local consolidation into a sdz/sd7 archive could be some sort of middle ground that's also worth considering. Some observations:
At the moment I would personally still gravitate slightly more towards trying out the embedded database, as it might be in the end an overall simpler architecture and code, but it would need to be tried out to confirm.
Problem
Rapid format is serving us relatively well because it provides good incremental updates support.
The major disadvantage of the current Rapid on-disk storage format is that it's rather slow for the game load performance:
All files are compressed using gzip
Currently in BAR the load cost because of gzip is ~2s on my tests from some time ago: https://discord.com/channels/549281623154229250/724924957074915358/1132367444837683240
Every single archive file is stored on disc separately
BAR has over 10000 files. There is high overhead because of all the syscalls and it's especially terrible on Windows. BAR has workarounds that open and close all pool files in lobby to reduce the load times by prewarming OS caches and triggering Antivirus on-open scans.
This issue is to investigate more performant solutions that still offer good support for incremental updates.
Compression
For the compression: zstd and l4 are great options. zstd has the compression ratios very similar to gzip, but a much faster decompression. lz4 decompression is in order of GB/s so it's close to transparent, and it makes it more feasible for software like pr-downloader to re-compress objects on the fly while downloading, without changing the content distribution format.
Many small files
For the many small files issue, we can investigate using an embeded key-value store databases like LevelDB, RocksDB, and LMDB. By storing the files in the embedded database, they can provide optimized storage access, and fully incremental changes just like we offer at the moment.
LMDB focuses on the read performance, supports out of the box concurrent access from multiple independent processes (pretty rare for embedded databases, and is important for the existing API usage and engine <-> pr-downloader interaction), has great platform support, small code base, and modern C++ bindings if we want.
The main unknowns in this approach are:
Whatever performance will be good for the both small and large files.
Storing large files separately in the filesystem is an option, but it adds implementation complexity