HdrHistogram / HdrHistogram

A High Dynamic Range (HDR) Histogram
http://hdrhistogram.github.io/HdrHistogram/
Other
2.17k stars 255 forks source link

Corpus of serialized histograms to aid interop testing with other implementations #122

Open marshallpierce opened 7 years ago

marshallpierce commented 7 years ago

To help ensure that different implementations all serialize and deserialize the same way, it would be helpful to have some examples of different flavors of serialized histograms. The sort of testing I'd like to be able to do with this is to check that given the same recorded values as the Java implementation, my implementation produces the exact same serialized bytes, and similarly that deserializing produces the original data that was recorded by the Java implementation.

Here's a starting point for a list of stuff to express in some easily-parsed format:

As far as format goes, it could certainly be done with JSON (you can do anything with JSON if you're willing to contort it enough, after all):

[
  {
    "desc": "All zero counts",
    "values": [],
    "lowest_discernible_value": 1,
    "highest_trackable_value": 9223372036854775807,
    "significant_digits": 3,
    "normalizing_offset": 0,
    "int_to_double_ratio": 1.0,
    "serialization_format": "V2",
    "serialized_bytes_b64": "..."
  },
  ...
]

This is probably at least reasonably convenient for most things to deserialize, but it does come burdened with JavaScript's heritage of inconvenient handling of large numbers. That's not to say that an implementation is required to parse the above with JavaScript's rules, of course.

Another option would be to go with something barebones like "INI"-style files (or the somewhat more well specified flavor TOML, which might look something like this:

# Slight abuse of TOML's Tables to serve as description
["All zero counts"]
values = []
lowest_discernible_value = 1
highest_trackable_value = 9223372036854775807
significant_digits = 3
normalizing_offset = 0
int_to_double_ratio = 1.0
serialization_format = V2
serialized_bytes_b64 = ...

["The next histogram"]
...

Even if there aren't convenient libraries, that should be pretty easy to parse by hand. It also allows comments.

Or, even simpler, we could just have one example per file and use the filename as the description with the contents as a simple key=value (aka properties) file.

Regardless of format, the code to emit these things would need to live somewhere. I think the simplest thing would be to simply have it live as a module in the Java project. That way, at least the Java project would have an easy time writing tests that use these sample files, even if the rest of the implementations would need some additional build system complexity to pull them in. Or, perhaps this should live in a new repo dedicated to such cross-implementation concerns where design docs, etc, could also live? Once there's consensus around an approach, I'm happy to PR this.

ahothan commented 7 years ago

Thanks for bringing this up. Few comments: These "compliance templates" should be accessible directly from the internet using a well defined method (REST API for example). This will abstract the actual storage location of these templates and takes care of the "how do I bring in the templates to test my implementation in my workspace".

As far as format of the template, my order of preference would be: 1) yaml 2) json