The state diagram in the aiohttp docs suggests that on_request_end is the callback that is called when the response has been completely received, however, looking at the implementation in aiohttp, this is not correct. Instead, this callback is invoked when the first byte (or initial chunk) of the response is received, see (initial await and callback invocation) but not when the response body has been read. This happens in higher-level code (see aio implementation of elastic-transport that reads the body).
Consequently, Rally stops the timer when the first byte of the response is received, instead of the entire response. As any code that uses the Elasticsearch client cannot make any progress before the body has been received, service time measurement is too "optimistic" at the moment. In many cases the difference will be small but if the response is large this can make a difference of several hundred milliseconds.
Steps to reproduce:
Run a Rally track that produces large responses (e.g. Universal Profiling, only available internally at the moment)
Observe the time difference between using the current callback and the fixed version (see below)
Fix
The following snippet fixes the issue:
diff --git a/esrally/client/factory.py b/esrally/client/factory.py
index 4a4c0179..daaaeeee 100644
--- a/esrally/client/factory.py
+++ b/esrally/client/factory.py
@@ -225,7 +225,7 @@ class EsClientFactory:
trace_config = aiohttp.TraceConfig()
trace_config.on_request_start.append(on_request_start)
- trace_config.on_request_end.append(on_request_end)
+ trace_config.on_response_chunk_received.append(on_request_end)
# ensure that we also stop the timer when a request "ends" with an exception (e.g. a timeout)
trace_config.on_request_exception.append(on_request_end)
Rally version (get with
esrally --version
):esrally 2.10.0.dev0 (git revision: df49d4d2c45d3dbcd2e6791524b0c5812430105e)
Invoked command: N/A
Configuration file (located in
~/.rally/rally.ini
)): N/AJVM version: N/A
OS version: N/A
Description of the problem including expected versus actual behavior:
Rally uses aiohttp tracing callbacks to measure service time, specifically:
on_request_start
to start a timeron_request_end
to stop the timerThe state diagram in the aiohttp docs suggests that
on_request_end
is the callback that is called when the response has been completely received, however, looking at the implementation in aiohttp, this is not correct. Instead, this callback is invoked when the first byte (or initial chunk) of the response is received, see (initial await and callback invocation) but not when the response body has been read. This happens in higher-level code (see aio implementation of elastic-transport that reads the body).Consequently, Rally stops the timer when the first byte of the response is received, instead of the entire response. As any code that uses the Elasticsearch client cannot make any progress before the body has been received, service time measurement is too "optimistic" at the moment. In many cases the difference will be small but if the response is large this can make a difference of several hundred milliseconds.
Steps to reproduce:
Fix
The following snippet fixes the issue:
The fix works because Rally continuously updates the end time on each callback invocation so in the end, the timestamp corresponds to the timestamp when the last chunk has been received.