elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.06k stars 24.52k forks source link

"block until refresh" indexing option #1063

Closed caravone closed 8 years ago

caravone commented 13 years ago

Feature request: Provide an option to the index operation that will wait until the next scheduled refresh occurs before returning a response. After the response is returned, all documents indexed in that operation should be visible for search.

s1monw commented 8 years ago

@nik9000 idea made me think if this could be done in a more contained way that doesn't require user interaction and can sustain high indexing load etc. and I think I have a good plain:

  1. when we index we either do it in bulk or as a single indexing request. Both are first done on the primary and then on the replica. Once replicas respond we respond to the user.
  2. on indexing we can store the maximum translog location which we are already doing for translog fsync
  3. if the request has force_visible:true we can register a listener on for instance IndexShard that holds the requests max maximum translog location and get's invoked on every refresh of the shard. The listener would be registered on both primary and replica and sends the response once called and valid.
  4. In the engine we just need to pull the current translog location before we refresh and store it on the engine, that way we can easily tell if something is visible since it has a clear greater than or equal relationship. Everything less than or equal than the refreshed location is visible.
  5. What if:
    • what if we pile up too many listeners? -> we force a refresh and free the pending listeners up
    • what if we time-out since there are no refreshes and no new requests that would call refresh? -> we fire a refresh before we timeout and only if it doesn't free up the listener we time it out.

I think this could work well and it's index time only, nothing needs to be shipped to the client nor send back. If works per shard and can heal itself if it will become unhealthy....

nik9000 commented 8 years ago

greater than or equal relationship

Right. This is much better than the time.

I wasn't as happy with the whole do it at index time thing at first but I think it'll work out fine with the kind of limits you've outlined.

Funbit commented 8 years ago

can you describe what you are trying to do? Like, if its not private or secret or anything. I'm not sure I'm keeping the right model of this use case in my head.

@nik9000 Sure. Our use case is pretty common. We have an SQL database containing file system information (folders & files) and we use Elasticsearch for searching. Since file system data changes pretty frequently we use a queue based approach to synchronize this file system with the Elasticsearch index. To guarantee the consistency the queue is processed synchronously and sequentially by a single server (lets name it X). So, for example, if a new file is added we add an "AddFile" message to the queue, when deleted - "DeleteFile". The X server processes each messages by updating Elasticsearch index. This works pretty well for us. The only problem is that sometimes we need to perform bulk updates (like in "RenameFolder" cases) and theoretically it can take a while, so we have to make sure that X server will start processing next message from the queue only when Elasticsearch has finished its work and index got refreshed. We don't care how long it takes, a second or even a minute, we just need a way to guarantee that data is fresh. I know, this use case might sound like we try to use Elasticsearch as a database, but I would disagree with that..

nik9000 commented 8 years ago

I know, this use case might sound like we try to use Elasticsearch as a database, but I would disagree with that..

Nah. Its fine. I don't know that this is the right way to accomplish the right kind of backpressure that you are looking for but it'll help anyway.

I've been working on this in between some other things that are pretty time consuming. It coming along, but its lots of code I'm not yet familiar with so its taking longer than it sounds like it should.

mikemccand commented 8 years ago

Note that Lucene has a (confusingly named) class, designed for precisely this use case: ControlledRealTimeReopenThread, such that every indexing op is assigned a long generation counter telling you which searcher refresh will expose the change.

For those (hopefully rare-ish) indexing requests that you must know when they have become visible, you ask this class to block until that generation has been refreshed.

I think it's similar to @s1monw's approach, except that approach uses the translog location, which we are already tracking per-indexing-op, instead of the separate long generation, which is nice.

bleskes commented 8 years ago

if I read ControlledRealTimeReopenThread correctly it's blocking, right? I think Simon was talking about an async approach where we don't block indexing threads, but rather do not respond to a network message until something happens.

mikemccand commented 8 years ago

if I read ControlledRealTimeReopenThread correctly it's blocking, right?

Hmm you don't have to use it that way ... yes, it does have blocking methods (waitForGeneration) to stall the calling thread until the specified gen has become live ... but for the async solution we could just ask it after each refresh (when we want to call the listeners which go and finish the network request) which gen is now live and invoke the listeners.

Hmm actually it has no getter for this. I'll add that!

nik9000 commented 8 years ago

So I have 3/4 of a patch for this using ES's translog. I'm distracted by reindex but I'm happy to come back to this eventually. Or have someone else take it. That is fine too.

mikemccand commented 8 years ago

So I have 3/4 of a patch for this using ES's translog.

Sorry, I think the translog solution is a great way to do it as well, since it already provides us a monotonically growing long value for each indexing operation (hmm, just take care when we switch to a new translog).

And if we ever wanted to cutover to Lucene's ControlledRealTimeReopenThread, most of your change would still be needed (handling this new force_visible param, invoking the async listeners after refresh).

So please keep going!

nik9000 commented 8 years ago

I just want to mention that this is still on my list but it is under reindex so it'll still be some time before I get this. If someone wants to take it from my I can transfer. If not, please be patient.

nik9000 commented 8 years ago

It has been many months but I finally proposed a patch for this. Go have a look at #17986 if you are wondering curious.

nik9000 commented 8 years ago

So after opening a PR for solve this (#17986) it has become apparent that this adds a decent chunk of complexity. @bleskes reviewed it and proposed some simplifications. But he also said something like, "why don't folks just use ?refresh on all of these requests. Sure, the index is less efficient but for low throughput users it should be ok and for high throughput users your solution isn't quite right anyway." And he's right I think.

So new proposal: just use ?refresh if you have less than 30 changes per second. Maybe 50. Something like that.

If you have more than that many indexes per second keep doing what you were doing. I presume you are just telling your users "it takes time for changes to become visible" or something like that?

So, yeah, the proposal in #17986 starts to break down if you have more than 100 blocked requests. That number is configurable, but if you go above the configured number you just get a refresh anyway. The idea is that holding an unbounded number of listeners is memory could eat a ton of resources. Unbounded is always bad.

I'm really curious if for the people that just ?refreshing doesn't work for, what is your use case? I think, for example, that @nharraud would probably be ok with ?refresh unless they have dozens of scheduling changes a second.

What we're trying to do is figure out how many people would really benefit from this. At this point I've spent a few days implementing the PR but it'd take another week or so to get it ready to be merged. and we'd have to maintain it which will take much more time.

sandermarechal commented 8 years ago

I think ?refresh will work for my use case here.

s1monw commented 8 years ago

if you start calling ?refresh you hammer the index with tiny tiny segments, you really wanna more stuff to disk if it's needed. It already is a problem to use it IMO since it's slow and puts a lot of load on the system. I am not sure where the complexity is coming from and why the queue is a problem it's not eating any resources except of some metadata in ram for a little while no?

roytmana commented 8 years ago

I do not think ?refresh is a good option either you will be just shifting unbound behaviour (too frequent refresh) to where it really causes harm

bleskes commented 8 years ago

Here's my reasoning:

If you index a few docs a second, I don't think anything matters and as the merge policy will clean things up quickly. It also works if you have a few indexing requests that need a refresh and a lot that don't in the background. I would be interested to know the high-ingestion rates use cases for acking an index operation to also mean "it's visible to all searches", willing to pay the price of indexing request latency (which can be anywhere are up to 2 seconds, using defaults and potentially more if refresh rates are increased) and the implication it may have to client side concurrency. Note that this doesn't give any guarantee that the doc will not be visible until then - replicas still have their own refresh rates etc.

I am not sure where the complexity is coming from

I think that any change touching both internal engine and replication action should be considered with care. I'm all for it if we decide it's important but I want to understand why.

@roytmana can you elaborate on your use case and why you need this feature? I only see a generic +1 up there...

nik9000 commented 8 years ago

I think that any change touching both internal engine and replication action should be considered with care. I'm all for it if we decide it's important but I want to understand why.

I think we wouldn't touch InternalEngine all that much after moving the queueing up a level.

bleskes commented 8 years ago

I think we wouldn't touch InternalEngine all that much after moving the queueing up a level.

Agreed, but I think you see my point (and it will still require a refresh listener on InternalEngine plus a concurrent construct on the IndexShard level).

nik9000 commented 8 years ago

Agreed, but I think you see my point

Yes.

and why the queue is a problem it's not eating any resources except of some metadata in ram for a little while no?

Yeah, I think maybe I should raise the default maximum size by a factor of ten or so.....

roytmana commented 8 years ago

@bleskes My case is probably very similar to other people. In some cases I think it would be reasonable to utilize Elastic as the primary data storage (rather than one feeding off and synchronized with another one). In this case it is a typical user interaction to add/modify a record and have it appear on respective "lists" by blocking UI till it is available in ES. While several work-arounds are possible, in a more generic case like needing to create multiple related entries in multiple indices to represent a business entity returned to the user it gets more complex to do "roll my own" solution and some (like using get API are not feasible any more). I did not explore possible use cases in depth because there was no much movement on this issue so I did not think it will ever get implemented. I would be interested in analyzing possible use cases further it ES team seriously planning to implement it :-)

bleskes commented 8 years ago

Thanks @roytmana

In this case it is a typical user interaction to add/modify a record and have it appear on respective "lists" by blocking UI till it is available in ES

How many of these updates do you expect? Is it acceptable for you to have latencies of >1s for those data updates?

ES team seriously planning to implement it :-)

We are seriously evaluating it and trying to figure out the pros and cons, as you can see :)

roytmana commented 8 years ago

@bleskes I am glad to see it :-)

I think we could have a latency of 2-3 seconds but going much beyond that will make UI to appear sluggish. In our case rate of interactive (UI driven) additions/changes will not be very high - I would estimate it 10-15/second UI actions needing blocking but it could translate to 4-5 times (50-60) ES documents (each action could be batched together I guess)

nik9000 commented 8 years ago

(each action could be batched together I guess)

If you can send them all as a single _bulk request that'd work with this solution.

The solution as proposed never leaks the actual thing that we use to ensure the read-after-write-ness across the wire or back to the user. Any solution that lets you make multiple API calls and then wait for them all to become visible necessarily requires Elasticsearch to leak that information in the form of a token that we send to you and then you send back to us. And it'd make this whole thing more complex, especially from a backwards compatibility standpoint.

MrHash commented 8 years ago

On the one hand Elasticsearch has evolved as an easy scalable distributed NRT (secondary) data store with advanced search features, while on the other hand there is some desire from the community for the team to implement a blocking pattern to serve a realtime (primary) data store use case in a CRUD context. While this is understandable, implementing this feature will ultimately not solve the architectural problems of the latter use case while meeting the goals of the first, and so for this reason I would like to downvote this feature and humbly advise people to take a different approach to achieving UI consistency in their applications.

If you do go ahead with incorporating this into core, then I would urge caution against compromising the quality of the platform for those of use who are using it without any problem in a NRT fashion. Perhaps this feature can be offered as a plugin?

nik9000 commented 8 years ago

I would urge caution against compromising the quality of the platform

It isn't worse, just more complex.

Perhaps this feature can be offered as a plugin?

I can't see a way to do that that doesn't make it 10x more complex.

roytmana commented 8 years ago

@nik9000 I would definitely _bulk related changes together - I would not want to do it one by one synchronously accumulating latency or in parallel having to fire it on multiple threads and block till all are done. bullkis the way to go. My concern is with ?refresh. With number of users growing we will end up with 10th and later 100th refreshes a second - that can't be good

nik9000 commented 8 years ago

100s of requests per second isn't sustainable, no. @bleskes was asserting to me this morning that 30 or so a second isn't horrible though. I've not tried it personally. I suppose I should just try it on my laptop....

s1monw commented 8 years ago

From what I have see a lot of search driven sites that have workflows need a feature like this to not refresh like crazy. They don't require the indexing operation to return fast but they need to make sure that once it's returned it's visible. The indexing pattern is unknown but I think there will be peaks and such.

bleskes commented 8 years ago

I think we could have a latency of 2-3 seconds but going much beyond that will make UI to appear sluggish

This is of course up to you - but if I was building a UI, a 2s call is not acceptable.

I would estimate it 10-15/second UI actions needing blocking

If you bulk all ES request together or just issue a single refresh call at the end, you should be fine. Your use case is one of the reasons to have the refresh flag to begin with.

With number of users growing we will end up with 10th and later 100th refreshes a second 100s of requests per second isn't sustainable, no. @bleskes was asserting to me this morning that 30 or so a second isn't horrible though. I've not tried it personally. I suppose I should just try it on my laptop....

I think it's important to note that these numbers are per shard. Not sure if those 100th refreshes per second will all go to a single index with a single shard.

I just run a quick test on my laptop using two python scripts running in a tight loop indexing a single doc in each call, all with refresh=true. Using a single node, single shard, I had a sustained rate of ~80 docs/sec . ES node was churning ~180% cpu , segments count stayed totally under control. I kept this up for 30 minutes.

From what I have see a lot of search driven sites that have workflows need a feature like this to not refresh like crazy. They don't require the indexing operation to return fast but they need to make sure that once it's returned it's visible. The indexing pattern is unknown but I think there will be peaks and such.

To be clear - I'm OK with implementing this if people think it's needed (I actually like it from a technical perspective). However in all use cases mentioned so far it doesn't seem to be the case. When we talk about peek indexing rights, I think we're still good as well -- if you have a burst of 10s worth of indexing the system will clean up all segments once it's done?

benoitvidis commented 8 years ago

In our use case, we use ES as our primary store but we don't control the way people will eventually use it. We already had some feedbacks about some of our users being confused by the NRT behaviour.

To my understanding, as @s1monw says, most users are actually looking for the block_until_refresh functionality, for which using refresh can be a workaround, for some situations.

@bleskes : A typical use case we bang our head on some time ago some other projects is cache invalidation. We had some websites that received content updates by batches, leading to high indexing rates. The indexing time did not matter much but we needed to be sure the index was up-to-date when we were refreshing the pages cache for which the content had changed. Such a feature would definitely had helped.

Another problem is we, as developers, are lazy. We write a piece of code where some data is indexed. Another function somewhere else then tries to read the data while it is not ready for search. Soon, a bug is opened and the developer finally fires a refresh after every single write operation in the codebase, being needed or not, "just in case".
I understand that the block_until_refresh adds some more complexity and is not a panacea but it looks much less harmfull than the current workaround to me.

To sum up: +1

Funbit commented 8 years ago

Also, please don't forget the use case I described above: https://github.com/elastic/elasticsearch/issues/1063#issuecomment-148918276

block_until_refresh would solve the problem for us.

LiquidMark commented 8 years ago

We have multiple background workers processing workflow sequences. For a sequence to continue, it needs to know that the ES index contains what has just been put into it. So, we do a refresh. But, this will scale poorly -- we are expecting (OK, hoping) that site traffic will drive this to hundreds of such refreshes per second. What's the alternative? Wait for 1 second for the NRT refresh to happen? What if ES gets heavily loaded? Is 1 second elapsed always sufficient? Better make it 2. or 3? That isn't robust, and it means that processing that background task, which provides our own NRT behavior to all users of our system, is slowed down significantly. The proposed block_until_refresh would return, on average, in 1/2 second, right?

+1 for block_until_refresh

bleskes commented 8 years ago

most users are actually looking for the block_until_refresh

Are they willing to have their indexing requests take 1 second?

The indexing time did not matter much but we needed to be sure the index was up-to-date when we were refreshing the pages cache for which the content had changed.

This is actually a good use case. That said, if your content updates are batched - you can just issue a refresh to ES after each batch was loaded and before you invalidate the cache.

I understand that the block_until_refresh adds some more complexity and is not a panacea but it looks much less harmfull than the current workaround to me.

I think the biggest down side of using block_until_refresh as a magic pill is that indexing operations will take 1s. This is will be surprising and will means this can only be used where that doesn't matter.

bleskes commented 8 years ago

@Funbit

Also, please don't forget the use case I described above: #1063 (comment)

if I read it correctly you are mentioning a special case where you rename a folder and need to do some special handling. In that case, you can call _refresh at the end of the operation to achieve the same (assuming the rename is not super frequent)

@LiquidMark

For a sequence to continue, it needs to know that the ES index contains what has just been put into it.

To be clear normal indexing (no refresh ) does put everything into the Index, it's just not visible for searches. It is as safe as with using refresh.

What's the alternative? Wait for 1 second for the NRT refresh to happen?

Block until refresh does exactly that for you (except it's more efficient as it knows if a refresh happens before the 1s has passed). You will not get fast indexing from it.

The proposed block_until_refresh would return, on average, in 1/2 second, right?

It's hard to say. It will first wait for refresh to happen on all the replicas (in parallel) and then make sure that the primary has refreshed in the meantime as well.

hat site traffic will drive this to hundreds of such refreshes per second

What is type of site is it? can you elaborate?

Funbit commented 8 years ago

@bleskes

if I read it correctly you are mentioning a special case where you rename a folder and need to do some special handling. In that case, you can call _refresh at the end of the operation to achieve the same (assuming the rename is not super frequent)

Yes, that is what we do now, call refresh after any write operation, even if we don't need to. With block_until_refresh it would be easier and faster. Also, we don't care about how much it would take to return, 5 seconds or even 30 seconds.

LiquidMark commented 8 years ago

@bleskes

To be clear normal indexing (no refresh ) does put everything into the Index, it's just not visible for searches. It is as safe as with using refresh.

Fair enough, given that I didn't express myself completely. What I meant is that the worker which has just done the indexing wants that data included in it's subsequent searches.

What's the alternative? Wait for 1 second for the NRT refresh to happen? Block until refresh does exactly that for you (except it's more efficient as it knows if a refresh happens before the 1s has passed). You will not get fast indexing from it.

Understood. But, isn't the efficiency of not doing many refresh requests per second worthwhile?

The proposed block_until_refresh would return, on average, in 1/2 second, right? It's hard to say. It will first wait for refresh to happen on all the replicas (in parallel) and then make sure that the primary has refreshed in the meantime as well.

OK.

...that site traffic will drive this to hundreds of such refreshes per second What is type of site is it? can you elaborate?

It's a marketplace where buyers may bid on hundreds, or even thousands, of acceptable alternatives at the same time. (When a buyer's needs are satisfied, alternative bids are automatically withdrawn, effectively simultaneously. We use a database with traditional ACID properties to enforce this guarantee.)

Alternative bids may be expressed individually, or as a combination of filters and attribute-based price calculators. We are using ES for bid/ask matching as well as for more traditional search. Native scripts execute to determine bid prices for items for sale.

Automatic bidding will cause all outstanding filter/calculator-based bidders to evaluate and possibly bid for/buy new items as they are posted to the system. Automatic ascending bidding (where all alternative bids are raised together), and automatic descending item ask prices, will result in bid and ask amounts constantly changing, with every buyer or seller potentially accounting for thousands of updates daily.

bleskes commented 8 years ago

@LiquidMark thx for explaining. I can see how block_until_refresh can greatly simplify the logic of background tasks, where the amount of them prevents the use of a classic refresh. I'm +1 on this to proceed.

nik9000 commented 8 years ago

I'm +1 on this to proceed.

Happy times! I'll get back the PR in a bit then!

nik9000 commented 8 years ago

Ok! An update: #17986 is almost ready to be merged. I haven't been working on it for 21 days straight, but it really did take about a week of my time like I said it would. And a good deal of review from other wonderful folks.

Anyway, instead of adding ?block_until_refresh the syntax will be ?refresh=wait_for. The idea is that we already have the flag and it doesn't make sense to have ?refresh&block_until_refresh.

In the java API the way to do this will be index.setRefreshPolicy(WAIT_FOR) and index.setRefresh(true) will be deprecated. Actually I think we'll remove it in favor of index.setRefreshPolicy(IMMEDIATE) but that'll require change a lot tests, everywhere.

nik9000 commented 8 years ago

This has been implemented and should be available in the next alpha/beta of 5.0.0. This is the PR for it: https://github.com/elastic/elasticsearch/pull/17986

LiquidMark commented 8 years ago

Yahoo! Thank you! On Jun 6, 2016 11:03 AM, "Nik Everett" notifications@github.com wrote:

This has been implemented and should be available in the next alpha/beta of 5.0.0. This is the PR for it:

17986 https://github.com/elastic/elasticsearch/pull/17986

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/elastic/elasticsearch/issues/1063#issuecomment-224004629, or mute the thread https://github.com/notifications/unsubscribe/ABAtSBBA3N9DVggbKaLOfWHa1aKLnGoPks5qJETQgaJpZM4AxZln .

luyuncheng commented 7 years ago

+1

JonSchapiro commented 7 years ago

+1

maxgurewitz commented 7 years ago

+1

rh78 commented 7 years ago

Thanks - its there and I just now need it :)

consulthys commented 6 years ago

Another possibility is using this plugin https://github.com/ForgeRock/es-change-feed-plugin and subscribe to indexation/deletion events as they happen.