Netflix / suro

Netflix's distributed Data Pipeline
Apache License 2.0
794 stars 170 forks source link

ElasticSearchSink dynamic server list update with Eureka when client.transport.sniff is false #165

Closed metacret closed 9 years ago

metacret commented 9 years ago

We observed wrong routing from ElasticSearchSink when we scale up the cluster. For example, suppose that we have two ES clusters, es0, es1 and es_sink_0 is talking to es0, es_sink_1 is talking to es1. If we scale up es1, it happens that es_sink_0 is sending data to es1. We are using client.transport.sniff=true by default. This should not happen theoretically because TransportClient will refresh its server list through communicating with the cluster and new nodes should not join to the wrong cluster.

We didn't find the root cause yet but this is the really serious problem. So, temporarily, I want to turn off sniff and add the feature that manually updates the server list through Eureka client.

metacret commented 9 years ago

Looks like a TransportClient bug. I am seeing a bunch of the following log lines since the data got screwed up.

2014-12-19 06:58:08,187 WARN elasticsearch[Sun Girl][generic][T#49358] transport - [Sun Girl] node null not part of the cluster Cluster [es_logsummary], ignoring...