elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.83k forks source link

The EnrichCache has suboptimal performance on cache hits #96050

Open joegallo opened 1 year ago

joegallo commented 1 year ago

Here's a snippet from GET _nodes/stats for a simple example on my machine:

             {
                "enrich" : {
                  "type" : "enrich",
                  "stats" : {
                    "count" : 653312,
                    "time" : "29s",
                    "time_in_millis" : 29004,
                    "current" : 0,
                    "failed" : 0
                  }
                }
              },

That enrich processor is costing us 44 microseconds per document. But the dataset in question is actually super cache-able:

  "cache_stats" : [
    {
      "node_id" : "uLluSH_HT8SthV3I9LQPrQ",
      "count" : 100,
      "hits" : 652965,
      "misses" : 347,
      "evictions" : 0
    }
  ]

44 microseconds per document is way too high a performance price to paying for cache hits. So why's it so slow?

https://github.com/elastic/elasticsearch/blob/92ac03fcc5cdcf4ca611f34f96f21e048a378397/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/EnrichCache.java#L129-L151

We use the above class for the cache keys for potentially avoiding execution of SearchRequests (no need to execute a SearchRequest if we already know what it will return). But it's not really a great class for that.

First, SearchRequest is mutable. We don't happen to mutate it, but still, it would be better if we were using a properly immutable class in the key.

Second, SearchRequest isn't some trivial class, there's a deep object tree there, and so hashCode and equals actually has quite a bit of work to do (see the definition of hashCode/equals for SearchRequest then click through to source which is a SearchSourceBuilder and scope out its hashCode/equals, then rinse and repeat, also note that all those Objects.equals calls are an array allocation unless something JIT'ed it out).

Screen Shot 2023-05-11 at 4 55 47 PM

Doubly-worse, our caching implementation must execute hashCode twice -- once to find the right underlying HashMap to talk to (arguably we control this, it's called viagetCacheSegment), and then a second time within the implementation of HashMap itself (we don't control this).

Screen Shot 2023-05-11 at 4 44 20 PM

So, even with a better key, it would still probably be worth it to pre-calculate the hashCode eagerly at construction time and then just return it repeatedly, like how Snapshot already does the same:

https://github.com/elastic/elasticsearch/blob/92ac03fcc5cdcf4ca611f34f96f21e048a378397/server/src/main/java/org/elasticsearch/snapshots/Snapshot.java#L76-L83

Meta: I went with >enhancement for this, but it's pretty close to >bug territory in my mind.

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-data-management (Team:Data Management)

joegallo commented 1 year ago

Another tiny improvement we could make while we're in the area:

joegallo@galactic:~/Code/elastic/elasticsearch $ git diff upstream/main...enrich-cache-wip-2 -- x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java
diff --git a/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java b/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java
index 114a26bbe83..c9a2ad5bc82 100644
--- a/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java
+++ b/x-pack/plugin/enrich/src/main/java/org/elasticsearch/xpack/enrich/AbstractEnrichProcessor.java
@@ -24,6 +24,7 @@ import java.util.function.BiConsumer;
 public abstract class AbstractEnrichProcessor extends AbstractProcessor {

     private final String policyName;
+    private final String enrichIndexBaseName;
     private final BiConsumer<SearchRequest, BiConsumer<List<Map<?, ?>>, Exception>> searchRunner;
     private final TemplateScript.Factory field;
     private final TemplateScript.Factory targetField;
@@ -46,6 +47,7 @@ public abstract class AbstractEnrichProcessor extends AbstractProcessor {
     ) {
         super(tag, description);
         this.policyName = policyName;
+        this.enrichIndexBaseName = EnrichPolicy.getBaseName(policyName);
         this.searchRunner = searchRunner;
         this.field = field;
         this.targetField = targetField;
@@ -77,7 +79,7 @@ public abstract class AbstractEnrichProcessor extends AbstractProcessor {
             searchBuilder.fetchSource(true);
             searchBuilder.query(constantScore);
             SearchRequest req = new SearchRequest();
-            req.indices(EnrichPolicy.getBaseName(getPolicyName()));
+            req.indices(enrichIndexBaseName);
             req.preference(Preference.LOCAL.type());
             req.source(searchBuilder);

Every little bit helps.

joegallo commented 1 year ago

If I'm reading this right, each CacheKey (in my little example) is about 600 bytes. Which isn't huge, but it isn't tiny either. Could we do better?

Screen Shot 2023-05-11 at 5 36 46 PM

And here's the cached values, each weighing ~1.3k and corresponding to our in-memory representation of something like the little doc snippet at the bottom there:

Screen Shot 2023-05-11 at 5 43 19 PM

        "_source" : {
          "zip" : 70116,
          "last_name" : "Brown",
          "state" : "LA",
          "city" : "New Orleans",
          "first_name" : "Mardy",
          "email" : "user1@asciidocsmith.com"
        }

edit: converting to Immutable collections rather than Unmodifiable wrappers around collections shaves 300 bytes off the results -- double edit though: that's not safe if the keys or values could be null, which I suppose it is the case that at least the values could be...

joegallo commented 1 year ago
    private void evictEntry(Entry<K, V> entry) {
        assert lruLock.isHeldByCurrentThread(); // lruLock held, amazing

        CacheSegment segment = getCacheSegment(entry.key); // lols, hashCode call inside here
        if (segment != null) {
            segment.remove(entry.key, entry.value, false);
        }
        delete(entry, RemovalNotification.RemovalReason.EVICTED);
    }
        void remove(K key, V value, boolean notify) {
            CompletableFuture<Entry<K, V>> future;
            boolean removed = false;
            try (ReleasableLock ignored = writeLock.acquire()) { // writeLock held, amazing
                future = map == null ? null : map.get(key);  // lols, hashCode call inside here
                try {
                    if (future != null) {
                        if (future.isDone()) {
                            Entry<K, V> entry = future.get();
                            if (Objects.equals(value, entry.value)) {
                                removed = map.remove(key, future);  // lols, hashCode call inside here
                                if (map.isEmpty()) {
                                    map = null;
                                }

OMG. @jbaiera and I are talking about this, and he realized that we're calling hashCode three times in the evictEntry path. For two of those calls we're holding the write lock, and for all three we're holding the lru lock.

He and I are going to walk into a lake now.

(edit: it's not that holding locks while computing hashCode is all that bad, it's more that it assumes that computing the hashCode is nearly instantaneous, which in this case it is most certainly not.)