elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.84k stars 24.71k forks source link

Indices exists returns false during recovery #8105

Closed sfussenegger closed 8 years ago

sfussenegger commented 10 years ago

I have a problem with the embedded Java client (for testing) after moving our our Java integration library from 1.0 to 1.4.0.Beta. We use the Java client embedded.

During application restart, we check if an index exists and create or update it. Now we have the problem, that indices exists returns false, causing create index to fail.

I tried IRC for a solution but @karmi thought it's supposedly a bug.

(test case to follow)

sfussenegger commented 9 years ago

Ok, so here's a test case although further discussion with @karmi suggested it's working as designed.

Nevertheless, running the linked code twice results in org.elasticsearch.indices.IndexAlreadyExistsException: [foobar] already exists (Note: it's reliably reproducible for me, but since it's a race condition, results may differ)

I've tested several version of ES and it turned out that 1.3.0 was the version introducing the change.

A viable workaround is waiting for cluster status before checking for existence (yellow is sufficient).

Not sure if this is a bug or by design, so feel free to proceed as you see fit.

clintongormley commented 9 years ago

I'd be interested in seeing what change in 1.3 is causing this too...

zakmagnus commented 9 years ago

I also run into this issue. I'm trying to see if I can repro it even when not in local mode. If I can, then I think it would definitely be a big problem. I don't think "wait for yellow state" is an acceptable solution when using ES as a remote system, because it can always just go down and then come back up, or change in any other arbitrary way, in between the "wait for yellow" request and the actual work request that I want it to do.

But maybe it's only a local mode quirk. That is still pretty annoying, because it's hard to test with local mode when it introduces its own strange behaviors. Whenever a test fails, I have to try to figure out if it's exposing a real buggy behavior or if it's just local mode being weird.

zakmagnus commented 9 years ago

A few more interesting notes:

Index creation is asynchronous (right?) so I actually think we can't expect Stefan's test case to pass. There doesn't seem to be any waiting after making the index creation call, and that call returning does not mean that an index has actually been created. For this reason alone, the test case may fail.

I tried waiting for yellow state and for an active shard after creating an index, but it didn't work. It would still often say that the index did not exist, even after those waits returned successfully, if asked quickly enough. This seems like a local mode problem, at the very least.

I should note that I was also shutting down the local node in between every operation, to stress it more. For example: start local node, create index, wait for health state, shut down node, start up node, check if index exists, shut down node. Note that I inserted a health wait after creation, but it didn't help. However, if I insert a health wait right before the check, then I could not get it to say that the index doesn't exist. I can't tell if this is because yellow state is somehow necessary, or if just putting that call in delayed the timing enough to change the race that I seemed to be observing.

I could not reproduce this issue with an actual external cluster as opposed to local mode. As long as I waited for a healthy state right after each creation call, then each index existence check would return accurately, regardless of what waits I did beforehand. There's a lot more time in between each event when I have to poke an external program, though, so again, I can't tell if this is just a case of slowing things down enough to hide any race conditions. But I'm pleased that at least it's not trivial to get an actual ES cluster to tell me wrong information.

clintongormley commented 9 years ago

Also see #9126

Schaka commented 8 years ago

I ran into this same issue with 5.0.0-alpha3.

During index recovery, which is asynchronous, we cannot check whether the index exists. So we need to wait for the cluster to start fully (or an event for recovery - which currently doesn't seem possible) until we check whether the index exists and THEN create it.

I thought this would be possile by registering a ClusterStateChangeListener, but no event seems to be fired for what I required.

Any workaround?

ywelsch commented 8 years ago

I've opened #19047 that fixes this issue.