When opening an Nx file, the file is (by definition) supposed to be at least 4K; but ideally,
we want to avoid any further reads beyond the 4K margin for a typical mod if possible; therefore minimizing the
file size to fit inside 4K is always desireable.
Create a version of the ToC header that is specialised for smaller archives.
Example:
u64: FileHash (xxHash64)
u28: DecompressedSize
u20: DecompressedBlockOffset [limits max block size]
u8: FilePathIndex (in [StringPool]]
u8: FirstBlockIndex
This gives us a header format with max 256 files, provided chunk size is under 1MiB and files themselves are under 256MiB.
This is a very common configuration.
Additional context
We want to get all file info in a single 4K header read,
Read Speeds of NVMe SSD (980 Pro)
This library is optimised in mind with smaller mods; most mods' TOC and header fits in within 4K.
On a 980 Pro (currently most popular NVMe), we observe the following latencies (QD 1):
4K: 52us
8K: 80us
---- Delimited For Margin of Error ----
16K: 95us
32K: 85us
64K: 89us
128K: 100us
256K: 140us
512K: 218us
Although benchmarks for random reads beyond 4K are hard to come by; this is a general pattern that seems to surface
from the limited data I could gather. In any case, this is the explanation for header size choice.
String Pool Size
The size of the string pool for an archive of 255 files is approximately 1000 bytes; leaving us roughly 4080 bytes of budget for the entries and blocks. Bit less if considering user data section into account.
This should allow us to fit around 150 files, an improvement of around 43 from the existing format.
Need one for Virtual FileSystems too, where chunk size may cap at 16K.
We can do this by reusing the now unused bits past version (due to lower maximums) as additional version fields.
Motivations
When opening an Nx file, the file is (by definition) supposed to be at least 4K; but ideally, we want to avoid any further reads beyond the 4K margin for a typical mod if possible; therefore minimizing the file size to fit inside 4K is always desireable.
This is an alternative to
Solution
Create a version of the ToC header that is specialised for smaller archives.
Example:
This gives us a header format with max 256 files, provided chunk size is under 1MiB and files themselves are under 256MiB. This is a very common configuration.
Additional context
We want to get all file info in a single 4K header read,
Read Speeds of NVMe SSD (980 Pro)
This library is optimised in mind with smaller mods; most mods' TOC and header fits in within 4K.
On a 980 Pro (currently most popular NVMe), we observe the following latencies (QD 1):
---- Delimited For Margin of Error ----
256K: 140us
Although benchmarks for random reads beyond 4K are hard to come by; this is a general pattern that seems to surface from the limited data I could gather. In any case, this is the explanation for header size choice.
String Pool Size
The size of the string pool for an archive of 255 files is approximately 1000 bytes; leaving us roughly 4080 bytes of budget for the entries and blocks. Bit less if considering user data section into account.
This should allow us to fit around 150 files, an improvement of around 43 from the existing format.