Open oschaaf opened 5 years ago
@oschaaf could you expand the description a bit to help me understand what the issue / scope is?
It’s not super uncommon that worker x gets lucky and observes better latencies then worker y. When the difference is significant we may want to call that out. It can imply a noisy run, but when that is not the case it is an interesting piece of information. We do report per worker in our proto/json output so this would be a UX thing for the CLI I guess. And maybe a log warning.
A next step would be to also do this per connection, which is a little more targeted, but I’d more work. Because this might arise because of unfair distribution of capacity at the connection level over at our test target. Possibly just using a single connection per worker with lots of thread/workers may suffice here to check this scenario.
Thank you. On the topic of doing this per connection. You are suggesting to use a single connection per worker. How do workers utilize connections today? I don't fully follow the second imbalance you described.
Sorry that was not super clear, let me attempt to clarify:
We have one pool per worker. Now we also report statistics per worker. So if one configures workers to use a single connection, that makes us effectively report statistics per connection as well.
@oschaaf agreed, per-worker stats and some outlier detection would be nice to have.
Perhaps it would be nice to add a feature which would make NH spotlight worker-local latencies/counters, when they significantly diverge from what they look like from a global/aggregated perspective in the output.
/cc @jmarantz @htuch