Closed dsnet closed 3 years ago
Yes, it is indeed possible to do what you propose. All it requires is a pseudo-random number generator using a known algorithm so that all testers get the same values. I'm personally bogged down by related work which is why I haven't been able to progress this. The original file was created by a cryptographically secure random generator so the result is not repeatable. I believe 100M pseudo-random 64-bit numbers + the special cases would be entirely sufficient.
Here's a short Go program that deterministically generates a es6testfile100m.txt
file. The PRNG is based on SHA-256 since we're going to need a hash anyways.
The generator (i.e., the next
function) is ~65 lines (~20 for the code itself and ~45 for the global state).
The first 1114 entries of the output are identical to the current estestfile100m.zip
dataset.
Here are the SHA-256 hashes of various lengths of outputs:
Entries | File size | File SHA-256 hash |
---|---|---|
1k | 38054 | ace079ffc98dfc66de4a1ea503baa3fd21dcae86c7fc1a4b7470715c825737f0 |
10k | 401124 | e1aead772d79a53df95289caf42f04b0f4fe1cf70058040e27bbd8f03a78b11c |
100k | 4033821 | ad12990add6d0b303e356a7aef76c3249ed00ab870fd01ea5d3366630edb48ba |
1m | 40359517 | 2b567bd9e82257b5b4ed2bec3e0ecc910722a8566ef0538d0a348c89bf82b9f1 |
10m | 403632090 | e48ee378494fa771a9fa109b1b52825cf30bdf4e59601dfc8b4895322d805a8f |
100m | 4036328199 | bed4cf9a666be044bbbe243f3465b666d3b9e1def461932f451aad5ad8c07324 |
Using compress/gzip
, the file is 2081257067B (1.94GiB) compressed, which should be just below the 2GiB limit for GitHub releases (to satisfy #15).
Here's a better generator: https://play.golang.org/p/yMVOf6kqS27
The main difference is that it incorporates all of the entries from Appendix B of the RFC.
Entries | File size | File SHA-256 hash |
---|---|---|
1k | 38054 | be18b62b6f69cdab33a7e0dae0d9cfa869fda80ddc712221570f9f40a5878687 |
10k | 401124 | b9f7a8e75ef22a835685a52ccba7f7d6bdc99e34b010992cbc5864cd12be6892 |
100k | 4033821 | 22776e6d4b49fa294a0d0f349268e5c28808fe7e0cb2bcbe28f63894e494d4c7 |
1m | 40359517 | 49415fee2c56c77864931bd3624faad425c3c577d6d74e89a83bc725506dad16 |
10m | 403632090 | b9f8a44a91d46813b21b9602e72f112613c91408db0b8341fb94603d9db135e0 |
100m | 4036328199 | 0f7dda6b0837dde083c5d6b896f7d62340c8a2415b0c7121d83145e08a755272 |
EDIT: I ported the above Go program to node.js and verified that the results are identical on two different architectures (both on Intel i7 and Apple M1).
Thanx for your work!
This is an alternative to #15.
I'd like to write a test that ensures formatting is canonical for a large suite of numbers, but require no network bandwidth when the test passes.
For the set of floating point numbers chosen in
es6testfile100m.zip
, can the sequence of input numbers be generated with a simple program?If so, then we can specify a simple program that generates just those numbers, and then have the test generate what would be equivalent to
es6testfile100m.txt
and hash it. If the hash matches, then we have confidence that it passed the test. If it fails, then we have to download the testfile in order to figure which entry differed.As far as I can tell, there seems to be a pattern for how the first 1144 entries were produced, but everything afterwards appears random. Do we know how the numbers after that point were generated?