django-es / django-elasticsearch-dsl

This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
Other
1.02k stars 263 forks source link

[Showerthought] Use virtual indexes for zero-downtime rebuilds? #75

Open Grendel7 opened 6 years ago

Grendel7 commented 6 years ago

Right now, when you rebuild a index, the index is nuked first, then rebuilt from scratch. During this reindexing process, any searches to the index might fail.

Instead, you could use "virtual indexes" to perform a rebuild without downtime. By that, I mean that you create a real index with a different name, e.g. index_name.<timestamp>. You can then point an alias for index_name and point it to the real index.

When rebuilding the index, you could create a new index in the background, populate it, then switch the aliases over. That way, the application can still use the old index while the new index is being created.

Most Elasticsearch applications I know use something like this and I'm willing to contribute something similar to this project. However, before I do that, I would like to know whether this is a desirable feature to have or whether it's unnecessary complexity for a generic library.

andreyrusanov commented 6 years ago

We were using feature like this (self-made) on several projects as well, because migrations happens on the way for DB and Search engine as well. We used raw numbers instead of timestamps for simplicity (at least for our case it was simpler)

andreyrusanov commented 6 years ago

One note - I believe if it will be introduced it needs to be done explicitly with some management command or something like this.

ezbc commented 5 years ago

I need this feature for a current project. Is this feature still desired in the library? If so I can start a PR.

From what I understand the command accepts an index name argument to build in the background, and the alias name argument. The command creates and builds the new index. When the new index is finished rebuilding the alias will be updated to point to the new index.

Is this the desired behavior?

josh-stableprice commented 5 years ago

I would suggest that the alias name and potentially the new index name suffix are configurable.

For example adding a --alias products and --index_suffix 20190605

This will then allow people to put whatever meaning to their reindexes that they need to capture, the index alias could default to f'{index_name}_alias' and the indexes to f'{index_name[:64]}_{uuid4}'

ezbc commented 5 years ago

Good thinking. Should the user be able to create an alias for each model so the virtual reindexing could be done for each model in the registry in one command? Building off your suggestion, perhaps the CLI could look like:

--alias_prefix alias_prefix and --index_suffix 20190605 and for each model an alias would be created following {alias_prefix}_{model_name} or something naming schema based off the model name and alias?

How does this sound?

josh-stableprice commented 5 years ago

Sounds perfect

josh-stableprice commented 5 years ago

@ezbc if you take a look at https://github.com/rtfd/readthedocs.org/pull/4368/files#diff-2859d2a6db2d38d6545b0ecadbae2f61R58 it looks like @safwanrahman has already done all of this along with making it celery based in the readthedocs project. We probably would want to heavily borrow this

safwanrahman commented 5 years ago

Thanks @ezbc for your interest. Yes, this feature is very much desired. I implemented this feature in Read The Docs, but did not get time to push it to this package. If you would like to start working on this, I would be very much happy to assist you in this. You can borrow the implementation I have done in RTD as mentioned by @josh-stableprice

ezbc commented 5 years ago

Thanks for pointing out the PR for RTD. I’ll get started on this feature this week and bring up any issues or questions along the way.

ezbc commented 5 years ago

@josh-stableprice and @safwanrahman I'm wondering if we should always delete the old index or not after a successful population of a new index. One use case I can think of for keeping indexes is if a user wanted to verify the new index before switching over the alias.

If we did not delete the old index automatically that would open a can of worms for the user to manage existing indexes, e.g. change aliases and delete old indexes. One option is to automatically delete the old index for now and add the functionality later for a user to not delete the old index and add commands to manage the old indexes.

What are your thoughts?

josh-stableprice commented 5 years ago

Your last thought basically hits the nail on the head, however don't kill yourself trying to implement that all on one go unless you have the spare time. I'd just implement a virtual index, with automatic replacement if indexing had no errors for now.

(Hope this makes sense, answered this as soon as I woke up)

On Tue, 11 Jun 2019, 00:58 Elijah Bernstein-Cooper, < notifications@github.com> wrote:

@josh-stableprice https://github.com/josh-stableprice and @safwanrahman https://github.com/safwanrahman I'm wondering if we should always delete the old index or not after a successful population of a new index. One use case I can think of for keeping indexes is if a user wanted to verify the new index before switching over the alias.

If we did not delete the old index automatically that would open a can of worms for the user to manage existing indexes to change aliases and delete old indexes. One option is to automatically delete the old index for now and add the functionality later for a user to not delete the old index and add commands to manage the old indexes.

What are your thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sabricot/django-elasticsearch-dsl/issues/75?email_source=notifications&email_token=ALEDBXFZTYXOSEZKQAIK4ALPZ3TA5A5CNFSM4EIIUDOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXLROAY#issuecomment-500635395, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEDBXF74W6MNLDVRBNL7IDPZ3TA5ANCNFSM4EIIUDOA .

ezbc commented 5 years ago

@josh-stableprice or @safwanrahman, I'm getting back into this now.

I'm considering if the aliases should all be updated in the same transaction after each model has a new rebuilt index. This seems like the safest option to me in case an app with multiple models deploys breaking changes for the model indexes.

What do you think?

josh-stableprice commented 4 years ago

I would agree if it makes it that much safer as that's the overarching goal of using the index aliases

On Thu, 5 Sep 2019 at 20:48, Elijah Bernstein-Cooper < notifications@github.com> wrote:

@josh-stableprice https://github.com/josh-stableprice or @safwanrahman https://github.com/safwanrahman, I'm getting back into this now.

I'm considering if the aliases should all be updated in the same transaction after each model has a new rebuilt index. This seems like the safest option to me in case an app with multiple models deploys breaking changes for the model indexes.

What do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sabricot/django-elasticsearch-dsl/issues/75?email_source=notifications&email_token=ALEDBXEGICHNZYU4OZLZ4ATQIFPBJA5CNFSM4EIIUDOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6AQZWA#issuecomment-528551128, or mute the thread https://github.com/notifications/unsubscribe-auth/ALEDBXETHZGOOPZ7TFU5W4DQIFPBJANCNFSM4EIIUDOA .

-- Josh Harwood Backend Developer Stable Group Ltd

Email: josh@stableprice.com Website: stableprice.com https://stableprice.com/ Office Address: 3 Whitehall Ct, London, SW1A 2EL, UK Company Twitter: https://twitter.com/stableprice> Company LinkedIn: https://www.linkedin.com/company/18252297/