HdrHistogram / hdrhistogram-go

A pure Go implementation of Gil Tene's HDR Histogram.
MIT License
429 stars 64 forks source link

Compressed histogram V2 support #31

Closed filipecosta90 closed 3 years ago

filipecosta90 commented 3 years ago

This is a WIP PR to share the current status of compressed histogram v2 encoding and decoding. wanted to pick your brain @ahothan on where I might be failing on the porting.

current to be added APIs

quick notes

Please notice that:

current issue

The current issue to be discussed is that taking as examples the encoded histograms from JS, python,etc... we're able to decode them, but if we encode them again the output based64 is different from the original one. Here is a good example of it:

GO111MODULE=on go test -count=1 ./...
--- FAIL: TestHistogram_Load (0.00s)
    hdr_encoding_test.go:27: The input and decoded->encoded representations differ:
        Original base64 input: HISTFAAAAB542pNpmSzMwMDAxAABzFCaEUoz2X+AMIKZAEARAtM=
        Output base64 encoded: HISTFAAAACZ42pJpmSzMwMDAxMDAwMjAwMDMAAGMUJoJxg9mAgQAAP//NU0Bpg==
        Differences (-got +want):
          bytes.Join({
                "HISTFAAAA",
        -       "B542pNpmSzMwMDAxAABzFCaEUoz2X+AMIKZAEARAtM",
        +       "CZ42pJpmSzMwMDAxMDAwMjAwMDMAAGMUJoJxg9mAgQAAP//NU0Bpg=",
                "=",
          }, "")
FAIL
FAIL    github.com/HdrHistogram/hdrhistogram-go 0.229s
FAIL
make: *** [test] Error 1
ahothan commented 3 years ago

Thanks for the PR, I'll try to spend time on this sometime this week, I will try to port some of the test cases I have in python version that create a histogram from scratch and verify that the encoded histogram are similar. Once the encoding is verified we can have a closer look at the decoding side, Clearly decode+encode should be idempotent.

ahothan commented 3 years ago

We should probably only support v2 format, don't think anybody is using V1 any more.

ahothan commented 3 years ago

Looks good. The reason the recode of a decoded histoblob coming from another implementation is different is due to the difference in compression library implementation. As long as they can decode properly, it should be fine.