elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.9k stars 24.73k forks source link

Search responses with large size can cause OOMs #110962

Open carlosdelest opened 3 months ago

carlosdelest commented 3 months ago

Elasticsearch Version

7.17.x - 8.x

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

Search responses that include a high number of hits and/or hits with considerable size can OOM a node, with the following stacktrace:

elasticsearch[node_name][transport_worker][T#XXXX]
  at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
  at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(I)Lorg/elasticsearch/common/bytes/BytesReference; (StreamInput.java:161)
  at org.elasticsearch.common.io.stream.StreamInput.readBytesReference()Lorg/elasticsearch/common/bytes/BytesReference; (StreamInput.java:127)
  at org.elasticsearch.search.SearchHit.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (SearchHit.java:150)
  at org.elasticsearch.search.SearchHits.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (SearchHits.java:90)
  at org.elasticsearch.search.fetch.FetchSearchResult.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (FetchSearchResult.java:42)
  at org.elasticsearch.search.fetch.QueryFetchSearchResult.<init>(Lorg/elasticsearch/common/io/stream/StreamInput;)V (QueryFetchSearchResult.java:28)
  at org.elasticsearch.action.search.SearchTransportService$$Lambda$6076+0x0000000801b1cc88.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; ()
  at org.elasticsearch.action.ActionListenerResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Lorg/elasticsearch/transport/TransportResponse; (ActionListenerResponseHandler.java:58)
  at org.elasticsearch.action.ActionListenerResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; (ActionListenerResponseHandler.java:25)
  at org.elasticsearch.transport.TransportService$4.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Lorg/elasticsearch/transport/TransportResponse; (TransportService.java:863)
  at org.elasticsearch.transport.TransportService$4.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; (TransportService.java:843)
  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Lorg/elasticsearch/transport/TransportResponse; (TransportService.java:1462)
  at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.read(Lorg/elasticsearch/common/io/stream/StreamInput;)Ljava/lang/Object; (TransportService.java:1449)
  at org.elasticsearch.transport.InboundHandler.handleResponse(Ljava/net/InetSocketAddress;Lorg/elasticsearch/common/io/stream/StreamInput;Lorg/elasticsearch/transport/TransportResponseHandler;)V (InboundHandler.java:311)
  at org.elasticsearch.transport.InboundHandler.messageReceived(Lorg/elasticsearch/transport/TcpChannel;Lorg/elasticsearch/transport/InboundMessage;J)V (InboundHandler.java:134)

Large search responses should not OOM a node, but be cancelled.

Steps to Reproduce

This was observed in production and we don't have a reproducible script.

Logs (if relevant)

No response

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)

original-brownbear commented 3 months ago

Just one thing to note here: the bug as seen in this stack trace has long been fixed by reading to pooled bytes arrays. We do however have a bunch of other remaining spots where we are not yet using pooled bytes.