HdrHistogram / hdrhistogram-go

A pure Go implementation of Gil Tene's HDR Histogram.
MIT License
429 stars 64 forks source link

Export and import histograms #1

Closed klizhentas closed 9 years ago

klizhentas commented 9 years ago

Hi Coda!

I have a case when I'm collecting histograms from various data sources via API and would like to merge them to get the combined histogram.

What do you think is the best way to export/import histograms for this package?

One way to do this is to add Export() Dump and Import(Dump) (*Histogram, error) functions that would operate on the following data structure:

type Dump struct {
    Version int
    LowestTrackableValue int64
    HighestTrackableValue int64
    SignificantFigures int64
    Counts []int64
}

The other approach is to make the raw iterator public, so I can implement any serialization/de-serialization logic in my app just by using iterator over counter values.

I can send a PR if you think that any of these approaches makes sense.

codahale commented 9 years ago

I kind of like the Import/Export idea. I'd be happy to accept a PR for it.

tylertreat commented 9 years ago

I've been looking at something similar myself. I have a case where I'd like to send some histograms over the wire. I managed to implement a serialization scheme along the lines of the original HdrHistogram, but it feels awfully hacky.

I'm a little torn. It'd be nice to maintain compatibility between this and the other versions, but I almost think a simple serialization would be preferable, even if it's a trivial gob encoding. In which case, maybe it doesn't make sense to include it in the library. Frankly, there probably aren't a lot of situations where you actually need compatibility.

Just exposing the iterator might make more sense since sending the histogram itself over the wire might be unnecessary.

I'd be interested to hear your guys' thoughts and perhaps send a PR your way @codahale.

kr commented 9 years ago

I would love to convince @josephruscio to make Librato receive whatever serialization format this produces, and to merge samples over time so we can see histograms with good error bounds at a coarser time granularity. Currently, our graphs (using the max of each quantile over a given timespan) are distinctly misleading when we zoom out farther. I suppose this might be a reason to prefer an interoperable format.

codahale commented 9 years ago

I think just adding a Snapshot() *Snapshot method and then deferring serialization to a later date makes sense. Just return a pointer to a struct, and folks can do with that what they want.

codahale commented 9 years ago

Closed by #5.