Closed jeremyBanks closed 5 months ago
I'll fix this sooner or later if nobody else gets to it first.
I'm working on this. :wave: This issue should probably have the crypto
label added for tracking. 🙂
For reference, here are some test vectors we can use to validate the results
From IETF draft-eastlake-fnv-21 (note that these are for the -A variants):
Appendix C. A Few Test Vectors
Below are a few test vectors in the form of ASCII strings and their FNV32 and FNV64 hashes using the FNV-1a algorithm.
Table 3: Strings without null (zero byte) termination
String FNV32 FNV64 "" 0x811c9dc5 0xcbf29ce4 84222325 "a" 0xe40c292c 0xaf63dc4c 8601ec8c "foobar" 0xbf9cf968 0x85944171f 73967e8 Table 4: Strings including null (zero byte) termination
String FNV32 FNV64 "" 0x050c5d1f 0xaf63bd4c 8601b7df "a" 0x2b24d044 0x089be207 b544f1e4 "foobar" 0x0c1c9eb8 0x34531ca7 168b8f38
From Go's standard library tests:
type golden struct {
out []byte
in string
…
}
var golden32 = []golden{
{[]byte{0x81, 0x1c, 0x9d, 0xc5}, "", … },
{[]byte{0x05, 0x0c, 0x5d, 0x7e}, "a", … },
{[]byte{0x70, 0x77, 0x2d, 0x38}, "ab", … },
{[]byte{0x43, 0x9c, 0x2f, 0x4b}, "abc", … },
}
var golden32a = []golden{
{[]byte{0x81, 0x1c, 0x9d, 0xc5}, "", … },
{[]byte{0xe4, 0x0c, 0x29, 0x2c}, "a", … },
{[]byte{0x4d, 0x25, 0x05, 0xca}, "ab", … },
{[]byte{0x1a, 0x47, 0xe9, 0x0b}, "abc", … },
}
var golden64 = []golden{
{[]byte{0xcb, 0xf2, 0x9c, 0xe4, 0x84, 0x22, 0x23, 0x25}, "", … },
{[]byte{0xaf, 0x63, 0xbd, 0x4c, 0x86, 0x01, 0xb7, 0xbe}, "a", … },
{[]byte{0x08, 0x32, 0x67, 0x07, 0xb4, 0xeb, 0x37, 0xb8}, "ab", … },
{[]byte{0xd8, 0xdc, 0xca, 0x18, 0x6b, 0xaf, 0xad, 0xcb}, "abc", … },
}
var golden64a = []golden{
{[]byte{0xcb, 0xf2, 0x9c, 0xe4, 0x84, 0x22, 0x23, 0x25}, "", … },
{[]byte{0xaf, 0x63, 0xdc, 0x4c, 0x86, 0x01, 0xec, 0x8c}, "a", … },
{[]byte{0x08, 0x9c, 0x44, 0x07, 0xb5, 0x45, 0x98, 0x6a}, "ab", … },
{[]byte{0xe7, 0x1f, 0xa2, 0x19, 0x05, 0x41, 0x57, 0x4b}, "abc", … },
}
I refactored the implementation to let it digest incrementally, like our Wasm implementations, to support these cases. However, when benchmarking it, I'm wondering whether a Wasm implementation might be better after all. The original rationale (https://github.com/denoland/deno_std/issues/2122#issuecomment-1114242933) for doing it in TypeScript was because it's simple enough to implement ourselves and there might be performance gains from avoiding the Wasm overhead. However, to my surprise the benchmark results seem to indicate the opposite. FNV is supposed to faster than cryptographic hashes, for cases where their strength isn't needed (and it looks like @std/http is still using it under that assumption), but take a look at this (with lots of caveats about the poor quality of my benchmarking environment, more validation required, etc.):
For tiny data (64 bytes), fastest to slowest:
For huge data (512 MiB), fastest to slowest:
At all sizes, our JavaScript FNV implementations are apparently slower than our Wasm implementations of cryptographically-strong hash functions. FNV64 in particular performs abysmally for large data, I'd guess due to considerable overhead from its implementation of 64-bit integer math operations on top of JavaScript's 64-bit floating-point Numbers.
If these results are accurate (I'll verify more carefully later), then it seems like it would make sense to move these implementations into our Rust/Wasm as well. There are several existing implementations on crates.io, however none of the widely-used ones provide both variants we need. Since it's only a handful of lines of logic, it might be safer just to implement it ourselves rather than add a dependency on a not-widely-used crate, and I'm leaning towards doing that for my PR.
edit: Trying a Wasm implementation locally, and I'm getting performance more like I expected. My results are still noisy and high-variance, but the FNV hashes are clearly faster than all of the cryptographic hashes, instead of being many times slower, and the code is simpler too. This seems like the way to go.
For tiny data (64 bytes), fastest to slowest:
For huge data (512 MiB), fastest to slowest:
The WASM bundle is probably a bit larger, but I can't immediately tell how much because I've also been cleaning up some no-longer-needed functionality that's cut the size down by 20%, dwarfing whatever the increase is. I'll break that into a separate PR so we can see the actual change from the new hashes before merging (edit: https://github.com/denoland/deno_std/pull/4510).
I was trying out some potential improvements to the
crypto_tests.ts
and noticed that FNV hash implementation requires that the data be provided as a singleBufferSource
, which is inconsistent with the cryptographic hashes, for which we also support the data being provided by an iterator/iterable or (for.digest()
) an async iterator/iterable (which is in part to support the WebCrypto Streams proposal).In practice, I doubt this is going to be used much, but given that it's being exposed by the same function, we should probably also support this for consistency.