HdrHistogram / hdrhistogram-go

A pure Go implementation of Gil Tene's HDR Histogram.
MIT License
429 stars 64 forks source link

add tests for simple integer timestamps since epoch #23

Open okayzed opened 7 years ago

okayzed commented 7 years ago

add tests for integer timestamp values.

On my local machine, the test for seconds since epoch (first test case) is failing, while the test for ms since epoch (second test case) succeeds.

okayzed commented 7 years ago
--- FAIL: TestTimeStamps (0.00s)
    hdr_test.go:408: Median was 2147483647, but expected 1476583605 +/- 1.476583e+06
FAIL
exit status 1
FAIL    github.com/codahale/hdrhistogram    0.184s
filipecosta90 commented 3 years ago

imho I believe we can close this one @ahothan. The added tests does not seem to bring anything new. wdyt?

okayzed commented 3 years ago

was the behavior fixed? this issue is from 4 years ago, i decided not to use hdrhistogram-go at the time because of this fault (among other things).

the added test shows where the values are incorrect when trying to use integer seconds since epoch. i think the test is worthwhile to keep to prevent regressions unless you already have some tests that cover this case.

filipecosta90 commented 3 years ago

was the behavior fixed? this issue is from 4 years ago, i decided not to use hdrhistogram-go at the time because of this fault (among other things).

the added test shows where the values are incorrect when trying to use integer seconds since epoch. i think the test is worthwhile to keep to prevent regressions unless you already have some tests that cover this case.

@okayzed this is a corner case in which all values are within the same sub-bucket of the histogram. Given that ValueAtQuantile will return the largest equivalent value of the sub-bucket for which the condition (100% - percentile) is true we get 2147483647.

@ahothan confirmed that his will also be true for all quantiles given that all 5 samples are within the same sub-bucket.

Added the check that the values 1476583605 and 2147483647 are equivalent given the specified histogram here

okayzed commented 3 years ago

how far apart would the timestamps need to be before the behavior is adjusted to give correct values? from the test case in this PR, the histogram is setup to expect values from TS-10,000 to TS+10,000

    hist := hdrhistogram.New(input[2] - 10000, input[2] + 10000, 3)

is there some workaround you can suggest to get more accurate values? (do you always ground the histogram at 0 if the min value is positive?)

giltene commented 3 years ago

I think that this is a case of misunderstanding the meaning of the lowest discernible value, with the misunderstanding made much more likely due to an unfortunate naming of the parameters to [ func New(minValue, maxValue int64, sigfigs int) ] in the go implementation. I would suggest changing the parameter names to e.g. match the ones used in the java implementation to make the meaning clear.

The lowest discernible value is NOT intended to be the minimum in a range of numbers covered by the histogram. Far from it. It is the smallest unit size that is discernible. Specifically, it would be correct to treat all numbers between 0 and the minimum discernible as equivalent, all values between the minimum discernible value and 2x that value as equivalent, etc. if that is not the behavior you want, and if you want a histogram that is accurate to within the number of decimal places specified, but can tell e.g. 1 and 2 apart, just use 1 as the minimum discernible value (and that is the default most people use, where they don’t even specify a lowest discernible value). The main use of a non-1 lowest discernible value is for cases where the units of measurement are much finer than the result ion desired. E.g. when measuring time values in nanoseconds in an application domain where there is no interest in e.g. telling apart numbers that are less than 20 microseconds apart. Setting the lowest discernible value to 20000 nanoseconds in this example, or to e.g. 1000000 nanosecond when the lowest discernible level desired is a millisecond, and similar use cases, helps save some histogram storage (eliminating several buckets and several hundreds or thousands of sub buckets), without requiring the caller to convert input from e.g. nanoseconds to milliseconds on each recording.

Clarifying the documentation and changing the parameter names will hopefully help resolve the confusion here.