Closed gbanasiak closed 5 days ago
After adding the following ad-hoc logs:
$ git diff
diff --git a/esrally/client/factory.py b/esrally/client/factory.py
index 2b5b99e3..950718b0 100644
--- a/esrally/client/factory.py
+++ b/esrally/client/factory.py
@@ -219,9 +219,18 @@ class EsClientFactory:
return super().loads(data)
async def on_request_start(session, trace_config_ctx, params):
+ self.logger.debug("on_request_start")
RallyAsyncElasticsearch.on_request_start()
async def on_request_end(session, trace_config_ctx, params):
+ self.logger.debug("on_request_end")
+
+ async def on_response_chunk_received(session, trace_config_ctx, params):
+ self.logger.debug("on_response_chunk_received")
+ RallyAsyncElasticsearch.on_request_end()
+
+ async def on_request_exception(session, trace_config_ctx, params):
+ self.logger.debug("on_request_exception")
RallyAsyncElasticsearch.on_request_end()
trace_config = aiohttp.TraceConfig()
@@ -235,9 +244,11 @@ class EsClientFactory:
# Rally's implementation of the `on_request_end` callback handler updates the timestamp on every call, Rally
# will ultimately record the time when it received the *last* chunk. This is what we want because any code
# that is using the Elasticsearch client library can only act on the response once it is fully received.
- trace_config.on_response_chunk_received.append(on_request_end)
+ trace_config.on_response_chunk_received.append(on_response_chunk_received)
+ # for logging purposes
+ trace_config.on_request_end.append(on_request_end)
# ensure that we also stop the timer when a request "ends" with an exception (e.g. a timeout)
- trace_config.on_request_exception.append(on_request_end)
+ trace_config.on_request_exception.append(on_request_exception)
# override the builtin JSON serializer
self.client_options["serializer"] = LazyJSONSerializer()
And configuring logging.json
as follows:
"loggers": {
"elasticsearch": {
"handlers": [
"rally_log_handler"
],
"level": "DEBUG",
"propagate": false
},
"rally.profile": {
"handlers": [
"rally_profile_handler"
],
"level": "INFO",
"propagate": false
},
"elastic_transport": {
"handlers": [
"rally_log_handler"
],
"level": "DEBUG",
"propagate": false
},
"esrally.client.factory": {
"handlers": [
"rally_log_handler"
],
"level": "DEBUG",
"propagate": false
}
}
The following sequences were captured:
# healthy request
2024-06-26 12:54:19,209 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_request_start
2024-06-26 12:54:26,457 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_request_end
2024-06-26 12:54:26,473 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_response_chunk_received
2024-06-26 12:54:26,474 -not-actor-/PID:26852 elastic_transport.node DEBUG > PUT /_bulk HTTP/1.1
[.. request / response ..]
2024-06-26 12:54:26,485 -not-actor-/PID:26852 elastic_transport.transport INFO PUT [https://REDACTED.elastic.cloud:443/_bulk](https://REDACTED.elastic.cloud/_bulk) [status:200 duration:7.276s]
# with timeout and on_request_exception
2024-06-26 12:54:26,517 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_request_start
2024-06-26 12:54:37,440 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_request_exception
2024-06-26 12:54:37,441 -not-actor-/PID:26852 elastic_transport.node DEBUG > PUT /_bulk HTTP/?.?
[.. request ..]
elastic_transport.ConnectionTimeout: Connection timeout caused by: TimeoutError()
2024-06-26 12:54:37,447 -not-actor-/PID:26852 elastic_transport.transport INFO PUT [https://REDACTED.elastic.cloud:443/_bulk](https://REDACTED.elastic.cloud/_bulk) [status:N/A duration:10.930s]
# with timeout but without on_request_exception
2024-06-26 12:54:48,479 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_request_start
2024-06-26 12:54:59,438 ActorAddr-(T|:38747)/PID:26852 esrally.client.factory DEBUG on_request_end
2024-06-26 12:54:59,440 -not-actor-/PID:26852 elastic_transport.node DEBUG > PUT /_bulk HTTP/?.?
[.. request ..]
elastic_transport.ConnectionTimeout: Connection timeout caused by: TimeoutError()
2024-06-26 12:54:59,445 -not-actor-/PID:26852 elastic_transport.transport INFO PUT [https://REDACTED.elastic.cloud:443/_bulk](https://REDACTED.elastic.cloud/_bulk) [status:N/A duration:10.966s]
This suggests that some timeouts lead to end
state instead of exception
state in aiohttp (see here). After https://github.com/elastic/rally/pull/1823 end
state is no longer handled by Rally as callback for on_request_end
was removed in favor of on_response_chunk_received
callback.
I'm checking whether on_request_end
callback restoration while keeping on_response_chunk_received
is enough to eliminate the bug.
Test results looked fine, so I've raised a fix in https://github.com/elastic/rally/pull/1861.
Noticed with latest Rally version:
When running
nyc_taxis
track withupdate
challenge the following stack trace was observed: