Closed dsnet closed 3 years ago
As a minor issue, it would be nice if the hexadecimal encoded float was always a zero-padded 16B number, rather than varying between 1-16. This simplifies the parsing logic.
If you think GZIP is a suitable format, you can consider using https://github.com/google/zopfli. It will take a while to compress, but it produces smaller files than the gzip
tool even at -9
level. I believe GZIP is probably the most suitable format due to it's pervasive influence.
I have no problems with this except that I don't want to get thrown out of GitHub by storing large files.
https://docs.github.com/en/github/managing-large-files/distributing-large-binaries might be helpful. It seems that GitHub allows large files up to 2GB attached as part of a "release".
This repository has a reference to a awesome test suite of 100m numbers. However, the format of the data is a ZIP file, which does not guarantee streamable parsing.
Can the following properties be provided?
This would allow unit tests to directly fetch, download, and process the testdata in a streaming manner without requiring the storage of the entire testdata on disk, nor occupying much memory. At least for my use-case. I have no intention of making this something that's downloaded with every CI submission, but something that's manually run by developers.