django-es / django-elasticsearch-dsl

This is a package that allows indexing of django models in elasticsearch with elasticsearch-dsl-py.
Other
1.02k stars 261 forks source link

Rebuild index with zero down time #314

Open jphilip opened 3 years ago

jphilip commented 3 years ago

Hi, thanks for all the work. I was trying to write a task to rebuild an index without down time on a single node setup and based on this example. Basically it creates a new index and swaps aliases to replace the existing index because an index cannot be renamed. I could make it work, but my new alias is not being used by django-elasticsearch-dsl because it seems to only rely on the index name, not an alias. It takes only 1.2 minutes to rebuild my index at this time, so the down time is not too bad, but is there any way to do this currently, or in the future? It would be nice to be able to rebuild indexes without down time.

jhofeditz commented 3 years ago

Instead of an alias, I solved this by adding a second document subclassing the original one so I can use the dsl in my scripts. Then I have a script like the one you linked to that rebuilds the tmp index and then copies from the tmp index to the public one without downtime.


class ModelDocument(ModelTmpDocument):
    """The actual user facing index"""
    class Index:
        name = es_index
antunesleo commented 3 years ago

I think this rebuild strategy can lead to information loss, what happens with documents being created while the rebuilding is running?

I'm currently studying a way to do implement rebuilding with no downtime, I've found this strategy interesting: https://medium.com/craftsmenltd/rebuild-elasticsearch-index-without-downtime-168363829ea4

Despite interesting, it's complex to implement since we rely in two different aliases, one for reads and another one for writes. While the migration is being executed, our models running in production code must be able to insert data in the correct index, since we don't want to loss any data created while the migration is running.

To implement the strategy, I think we could do a "CQRS Style", using 2 models: one for reads and another one for writes, e.g

class NoDowntimeReadDocument:
   """ Maybe overide all write methods and raise some exception to reinforce design"""
    pass

class NoDowntimeWriteDocument:
   """ Maybe overide all read methods and raise some exception to reinforce design"""
    pass

class ExampleRead(NoDowntimeReadDocument):
    """The actual user facing index"""
    class Index:
        name = read_example

class ExampleWrite(NoDowntimeWriteDocument):
    """The actual user facing index"""
    class Index:
        name = write_example

read_example and write_example are aliases, so we can change aliases index's while the migration is running as suggested by the article. That way we can rebuild it with no downtime, with no data loss and without changing application's code.

We also would create new commands build_no_downtime and rebuild_no_downtime to do the new rebuilding strategy.

The negative point in this implementation is that we have to keep 2 models to be able to use it, but I could not figure out another implementation that doesn't requires to refactor or rewrite a lot stuff in elasticsearch dsl.

What do you guys think about it?

oehrlein commented 3 years ago

@jphilip Please take a look at my pull request #358 and let me know if it works for you!