Closed mlyczek closed 2 months ago
Thanks for all of the work in narrowing this down! Do you know if the problem happens with es-hadoop versions older than 8.6 with elasticsearch versions 8.6 or newer? That is, given that es-hadoop has not changed much lately, I'm assuming that the bug is actually in elasticsearch (rather than es-hadoop). But it would be good to have confirmation of that.
Do you know if the problem happens with es-hadoop versions older than 8.6 with elasticsearch versions 8.6 or newer?
Actually you already answered that question by providing a python script that hits elasticsearch directly, so disregard my question!
Do you know if the problem happens with es-hadoop versions older than 8.6 with elasticsearch versions 8.6 or newer?
I've just run the check using es-hadoop 8.0.0 against Elasticsearch 8.6.0 and the issue also occurs. We noticed it while upgrading our Elasticsearch clusters to 8.12.1.
As for where the bug is, to be honest, I considered creating an issue also in Elasticsearch's repository, but I thought it would be better to report it here first, so that you are also aware of this change in Elasticsearch behaviour, and decide where the best place to fix it is (either in Elasticsearch or in es-hadoop).
I'm happy to provide any further detail that might help you, or to report this issue also to main Elasticsearch repository.
It looks like the bug is a race condition in es-hadoop that has been there for years, and was probably made more likely when the desired-balance allocator (or some change related to it) was introduced in 8.6.0. Here's what es-hadoop does when indexing a new document:
The problem is that if two spark tasks are inserting documents into a nonexistent index at the same time, the first task does all the steps above. The second task comes in and sees that the index exists, so it skips the the second step (critically the wait_for_status=YELLOW check). So it immediately queries for the shards, which might not exist yet because the index is still being created.
It looks like we're only using the shards to get a node to write to. And there's no guarantee that the shards returned by _search_shards
will even be there when we actually go to write. So the best change here might be to just pick a node at random if the list of shards is empty.
If I can recommend a solution I would vote for:
if set of shard is empty, wait for YELLOW get shards again - now it should pass
There's no guarantee that shards won't move immediately after we fetch them.
throw an exception if seconds attempt for getting shards fails
But there's also no real harm in letting elasticsearch deal with this. If there is some real problem allocating shards at all, then elasticsearch is going to let us know anyway.
For those reasons, I'm just returning a random node if he shard check fails. This will probably almost always only be hit only when two splits are trying to create an index at the same time.
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved.
Often a solution already exists! Don’t send pull requests to implement new features without first getting our support. Sometimes we leave features out on purpose to keep the project small.
Issue description
The automatic index creation feature in elasticsearch-spark has stopped working with Elasticsearch clusters starting from version 8.6.0.
When using default saving mode which saves documents directly to Elasticsearch shards, when Spark creates more than one task to save those documents into Elasticsearch, at least one task fails with the exception like in "Stack trace" section below.
I have spent (together with my colleagues) some time investigating this issue and we have found the following:
HEAD /<name-of-index>
) returns HTTP status 200 meaning that index existsGET /<name-of-index>/_search_shards
for some short period of time returns empty list of shards, but this period (despite being short) is long enough to cause the assertion inRestService.initSingleIndex()
to failSteps to reproduce
es.nodes.wan.only
andes.nodes.client.only
set tofalse
(default)Code:
See https://github.com/mlyczek/elasticsearch-hadoop-concurrency-issue for full project ready to run.
Strack trace
Full log: full-log.txt
More investigation information
I have prepared a small Python script to show the inconsistent behaviour of Elasticsearch. It more or less mimics the behaviour of
RestService.initSingleIndex()
(checks if index exists, if not then creates it, waits for YELLOW status and then gets shards, if index does exist, then gets shards right away).Below is one of the outputs that I got after running the above script (it is necessary to adjust the sleep time at the end depending on the machine to catch the short period of time when the issue exists):
In the above log, it can be seen that while one thread is waiting for Elasticsearch to finish creating the index, the other thread gets information that index exists, but the list of shards is empty, and after sleeping 0.01 second the list of shards is returned correctly.
Version Info
OS: Linux JVM: OpenJDK 17.0.9 Spark: 3.3.2 ES-Spark: 8.13.1 ES: >= 8.6.0