elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.59k stars 24.63k forks source link

Cancellation checks during bulk scoring add significant runtime cost #104273

Closed original-brownbear closed 2 months ago

original-brownbear commented 8 months ago

From profiling the scroll phase of the http_logs rally track:

image

These checks are far more expensive than the docCount method call itself. We also have other code where we call ExitablePointValues back-to-back to e.g. compare it's size() and getDocCount() return, thus running redundant checks.

We could optimise the checks itself, but like we did elsewhere it's probably better to trade cancellation liveness for performance here?

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-search (Team:Search)

DaveCTurner commented 8 months ago

Doing the check less often seems ok (if easy to do without losing too much promptness) but it does look like there's some easy optimisations to the check itself. The flamegraph indicates that most of the time is just iterating the HashSet? Does this need to be a HashSet? We fill it with lambdas, there's not going to be any duplicates in there are there?

Can you share an expanded flamegraph for the check itself?

original-brownbear commented 8 months ago
image

Here you go I don't think there's much you can optimize here. I think we could move to an ArrayList for the data structure for sure (as you say it's only lambdas going in there so there's never any equality anyway), but if you look at the graph everything is inlining 100% already. The cost really isn't in the code but in the fetching of the data I'd guess (could check that later). And really, even if we optimised this massively by say 80%, it's still too much overhead IMO. I bet this could be refactored in a way that makes this check a single volatile read (but at quite some cost in work and come complexity) by running the actual check on another thread for timeouts and pushing flag cancellation checks via a callback mechanism ... but really, it seems more reasonable to just find a way to not check in hot loops :) That should be less work and give a much more predictable performance return.

... that said, let me open a PR for the list real quick, better to save 2% now than not :P

edit: https://github.com/elastic/elasticsearch/pull/104304

DaveCTurner commented 8 months ago

Hmm ensureNotCancelled shouldn't be that expensive either, see #104305.

benwtrent commented 2 months ago

@original-brownbear can we close this? Or is there still an issue here?

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)

original-brownbear commented 2 months ago

@benwtrent it's cheaper now but still quite visible in profiling last time I checked, I believe this shouldn't show up at all ideally. I'll try to remember updating the flame graph so people can understand the priority better.

original-brownbear commented 2 months ago

Jup this can be closed, the cancellation check isn't even visible in flame graphs anymore :)

image