cyberphone / json-canonicalization

JSON Canonicalization Scheme (JCS)
Other
98 stars 23 forks source link

Host ES6 numbers testdata in a permanent and streamable manner #15

Closed dsnet closed 3 years ago

dsnet commented 3 years ago

This repository has a reference to a awesome test suite of 100m numbers. However, the format of the data is a ZIP file, which does not guarantee streamable parsing.

Can the following properties be provided?

  1. a permanent URL to the testdata,
  2. the URL be to a direct download, rather than a page with another link to click on, and
  3. that it be stored in a format that supports streaming (e.g., a simple GZIP'd file).

This would allow unit tests to directly fetch, download, and process the testdata in a streaming manner without requiring the storage of the entire testdata on disk, nor occupying much memory. At least for my use-case. I have no intention of making this something that's downloaded with every CI submission, but something that's manually run by developers.

dsnet commented 3 years ago

As a minor issue, it would be nice if the hexadecimal encoded float was always a zero-padded 16B number, rather than varying between 1-16. This simplifies the parsing logic.

dsnet commented 3 years ago

If you think GZIP is a suitable format, you can consider using https://github.com/google/zopfli. It will take a while to compress, but it produces smaller files than the gzip tool even at -9 level. I believe GZIP is probably the most suitable format due to it's pervasive influence.

cyberphone commented 3 years ago

I have no problems with this except that I don't want to get thrown out of GitHub by storing large files.

dsnet commented 3 years ago

https://docs.github.com/en/github/managing-large-files/distributing-large-binaries might be helpful. It seems that GitHub allows large files up to 2GB attached as part of a "release".