elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.18k stars 24.85k forks source link

Avoid opening outbound connections during shutdown #77539

Open DaveCTurner opened 3 years ago

DaveCTurner commented 3 years ago

To answer the question raised in this comment I started to look at whether we could block on the closeLock and I think the answer is no but also I now have more questions about concurrency during closing. Can we definitely not add to acceptedChannels after the server channel is closed? Also this comment:

https://github.com/elastic/elasticsearch/blob/e6fd459a6e275efa663af0b711c87da3b18d8007/server/src/main/java/org/elasticsearch/transport/TcpTransport.java#L282

Does it really do that? It did in the past when we blocked the thread while the connection completes, but today I think it's possible for us to open a connection and then leak it by shutting the event loop down. It'll be ok for connections that are happening via ClusterConnectionManager#connectToNode since we closed all the connection managers down first, but we bypass that mechanism for discovery and sniffing.

Not sure there's really a bug here, I haven't tried to reproduce it or anything. If there is a bug then it's benign in production anyway since we're shutting down so everything will be cleaned up soon, but it's still untidy and might cause test issues.

elasticmachine commented 3 years ago

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner commented 3 years ago

I think we hit exactly this problem in https://gradle-enterprise.elastic.co/s/3lq4o3iswxumo/console-log?task=:server:test which seems like an odd coincidence since this isn't a new bug and yet I've not seen it result in test failures ever before.

DaveCTurner commented 2 years ago

Note that #86315 means we no longer need to protect against connection attempts while closing. Should we just remove this (buggy) protection mechanism instead of trying to fix it?