This PR updates the Prometheus queries for OSD read/write used in the FIO benchmark to ensure consistent data is delivered.
Fixes
irate to rate
irate has been changed to rate, as with irate graphs like the following are possible:
This has caused tests where the throughput of all OSDs were reported as 0 Bytes/secs, regardless of the test being performed.
The reason for this is that irate looks at the latest 2 data points and rate looks at the entire range.
65 Second interval
The interval has been extended to 65 seconds to avoid a sampling issue. Based off my tests, there is a 15 second refresh interval for ceph_osd_op_*_bytes (see below):
A 65 second interval was chosen to ensure that 4 refreshes were always captured.
An example of the graph the Prometheus query creates is below:
Description
This PR updates the Prometheus queries for OSD read/write used in the FIO benchmark to ensure consistent data is delivered.
Fixes
irate
torate
irate
has been changed torate
, as withirate
graphs like the following are possible:This has caused tests where the throughput of all OSDs were reported as 0 Bytes/secs, regardless of the test being performed.
The reason for this is that
irate
looks at the latest 2 data points andrate
looks at the entire range.65 Second interval
The interval has been extended to 65 seconds to avoid a sampling issue. Based off my tests, there is a 15 second refresh interval for
ceph_osd_op_*_bytes
(see below):A 65 second interval was chosen to ensure that 4 refreshes were always captured.
An example of the graph the Prometheus query creates is below: