Corpus of serialized histograms to aid interop testing with other implementations

To help ensure that different implementations all serialize and deserialize the same way, it would be helpful to have some examples of different flavors of serialized histograms. The sort of testing I'd like to be able to do with this is to check that given the same recorded values as the Java implementation, my implementation produces the exact same serialized bytes, and similarly that deserializing produces the original data that was recorded by the Java implementation.

Here's a starting point for a list of stuff to express in some easily-parsed format:

Histogram parameters relevant to serialization: lowest value, highest value, significant digits, int/double ratio
Serialization format: V2, V2 + DEFLATE, etc
Serialized bytes, presumably base64'd
Values originally recorded
Description of some sort, like "every single value recorded once" or "all zero counts" or what have you

As far as format goes, it could certainly be done with JSON (you can do anything with JSON if you're willing to contort it enough, after all):

[
  {
    "desc": "All zero counts",
    "values": [],
    "lowest_discernible_value": 1,
    "highest_trackable_value": 9223372036854775807,
    "significant_digits": 3,
    "normalizing_offset": 0,
    "int_to_double_ratio": 1.0,
    "serialization_format": "V2",
    "serialized_bytes_b64": "..."
  },
  ...
]

This is probably at least reasonably convenient for most things to deserialize, but it does come burdened with JavaScript's heritage of inconvenient handling of large numbers. That's not to say that an implementation is required to parse the above with JavaScript's rules, of course.

Another option would be to go with something barebones like "INI"-style files (or the somewhat more well specified flavor TOML, which might look something like this:

# Slight abuse of TOML's Tables to serve as description
["All zero counts"]
values = []
lowest_discernible_value = 1
highest_trackable_value = 9223372036854775807
significant_digits = 3
normalizing_offset = 0
int_to_double_ratio = 1.0
serialization_format = V2
serialized_bytes_b64 = ...

["The next histogram"]
...

Even if there aren't convenient libraries, that should be pretty easy to parse by hand. It also allows comments.

Or, even simpler, we could just have one example per file and use the filename as the description with the contents as a simple key=value (aka properties) file.

Regardless of format, the code to emit these things would need to live somewhere. I think the simplest thing would be to simply have it live as a module in the Java project. That way, at least the Java project would have an easy time writing tests that use these sample files, even if the rest of the implementations would need some additional build system complexity to pull them in. Or, perhaps this should live in a new repo dedicated to such cross-implementation concerns where design docs, etc, could also live? Once there's consensus around an approach, I'm happy to PR this.

HdrHistogram / HdrHistogram

Corpus of serialized histograms to aid interop testing with other implementations #122