Closed aboudreault closed 3 years ago
Here is a small script used to benchmark the improvement:
import time
from random import random
from greplin import scales
stats = scales.collection('test', scales.PmfStat('request_timer'))
request_timer = stats.request_timer
start = time.time()
for i in range(1000000):
request_timer.addValue(random())
end = time.time()
# Wait to add a last value, 20 seconds ensure rescale will be triggered
time.sleep(20)
request_timer.addValue(0)
print 'Recorded 1000000 metrics in {0:10.8f} seconds.'.format(end - start)
print 'Min: {0:10.8f}'.format(request_timer['min'])
print 'Max: {0:10.8f}'.format(request_timer['max'])
print 'Mean: {0:10.8f}'.format(request_timer['mean'])
print '75percentile: {0:10.8f}'.format(request_timer['75percentile'])
print '95percentile: {0:10.8f}'.format(request_timer['95percentile'])
print '98percentile: {0:10.8f}'.format(request_timer['98percentile'])
print '99percentile: {0:10.8f}'.format(request_timer['99percentile'])
print '999percentile: {0:10.8f}'.format(request_timer['999percentile'])
Here is the result:
Recorded 1000000 metrics in 47.78437400 seconds.
Recorded 1000000 metrics in 5.423353200 seconds.
We had some performance issues when metrics were enabled in our application. This PR contains a performance improvement in the rescale code, notably the use of heapq.
This work is mostly a port of the pyformance rescale code. Thanks for their work.