elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.51k stars 24.6k forks source link

A mixed cluster failure due to leak detected in logs #109476

Closed pgomulka closed 2 months ago

pgomulka commented 3 months ago

CI Link

https://gradle-enterprise.elastic.co/s/5752xbvcac4na/failure#1

Repro line

n/a

Does it reproduce?

Didn't try

Applicable branches

main 8.13.4

Failure history

No response

Failure excerpt

a mixed cluster test failed due to leak detected https://gradle-enterprise.elastic.co/s/5752xbvcac4na/failure#1

[2024-06-07T12:17:17,259][ERROR][o.e.t.LeakTracker        ] [v8.13.4-2] LEAK: resource was not cleaned up before it was garbage-collected.
»  Recent access records: 
»  #1:
»   org.elasticsearch.server@8.13.4/org.elasticsearch.search.SearchHits.deallocate(SearchHits.java:253)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.search.SearchHits.decRef(SearchHits.java:244)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.search.fetch.FetchSearchResult.deallocate(FetchSearchResult.java:116)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.search.fetch.FetchSearchResult.decRef(FetchSearchResult.java:108)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:437)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.handleResponse(InboundHandler.java:382)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.executeResponseHandler(InboundHandler.java:147)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:122)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:96)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:821)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:124)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:96)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:61)
»   org.elasticsearch.transport.netty4@8.13.4/org.elasticsearch.transport.netty4.Netty4MessageInboundHandler.channelRead(Netty4MessageInboundHandler.java:48)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»   io.netty.codec@4.1.94.Final/io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
»   io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
»   io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
»   io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
»   io.netty.transport@4.1.94.Final/io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
»   io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
»   io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
»   io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
»   io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
»   and more..
elasticsearchmachine commented 3 months ago

Pinging @elastic/es-core-infra (Team:Core/Infra)

rjernst commented 3 months ago

This looks related to search hits, so passing to the search team.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search (Team:Search)

benwtrent commented 3 months ago

I think this is the bug that @original-brownbear fixed in 8.14, but wasn't backported to 8.13. So our integration tests hit it.

https://github.com/elastic/elasticsearch/pull/108562

rjernst commented 3 months ago

Another example of this: https://gradle-enterprise.elastic.co/s/333wraumu24co/console-log?page=3

Excerpt:

[2024-06-11T13:22:55,974][ERROR][o.e.t.LeakTracker        ] [v8.13.4-3] LEAK: resource was not cleaned up before it was garbage-collected.
»  Recent access records:
»  #1:
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.SearchResponse.decRef(SearchResponse.java:231)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.MultiSearchResponse.deallocate(MultiSearchResponse.java:166)
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.MultiSearchResponse.decRef(MultiSearchResponse.java:155) 
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:291) 
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.TransportMultiSearchAction$1.finish(TransportMultiSearchAction.java:190) 
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:176) 
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.TransportMultiSearchAction$1.onResponse(TransportMultiSearchAction.java:161) 
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.search.TransportMultiSearchAction$1.onResponse(TransportMultiSearchAction.java:157) 
»   org.elasticsearch.server@8.13.4/org.elasticsearch.action.ActionListener$3.onResponse(ActionListener.java:314)

@benwtrent @original-brownbear The above failure seems to occur several times per day in main and PRs. Is backporting the mentioned PR to 8.13 doable?

benwtrent commented 3 months ago

@rjernst https://github.com/elastic/elasticsearch/pull/108562 is backported, but since there was never an 8.13.5 release, the bugfix version of 8.13.4 doesn't have the commit.

I am not sure what to do here?

I don't know the particular test where its failing, if I did, we could just mute this particular version on this particular test.

benwtrent commented 3 months ago

@rjernst I will open a PR to mute all collapse yaml tests when ran against 8.13.x.

benwtrent commented 2 months ago

I have muted the effected tests for < 8.14. Hopefully these failures disappear. Closing issue.