WARNING: This is a breaking update, which requires a slight tweak to processor files. Those PRs (which I'll also be making) should be merged in at the same time as this update, and will be listed here shortly.
How one developer from Washington, DC cut his Elasticsearch indexing time in half. Sysadmins hate him!
Two main changes:
Update version of py-elasticsearch
According to the docs, we have been using the wrong version. Since we're all using Elasticsearch > 1.0, we need to be using a 1.X.X version of the python elasticsearch wrapper.
Please pip install -r requirements.txt
Switch to using bulk indexing
Previously, we were making a request for each document we were indexing. With this PR, we now use bulk indexing to do it in one go (to be exact, in larger chunks, but the same idea applies). Faster, more efficient.
We're also now using the index method instead of A) trying the create method, then B) falling back on the update method if the document is already there. index automatically updates the document if it's already there.
Is this change going to production before June 29th? We are planning a release that depends on sheer at that date, with subsequent releases in the 3 weeks after that. @rosskarchner
WARNING: This is a breaking update, which requires a slight tweak to processor files. Those PRs (which I'll also be making) should be merged in at the same time as this update, and will be listed here shortly.
How one developer from Washington, DC cut his Elasticsearch indexing time in half. Sysadmins hate him!
Two main changes:
Update version of py-elasticsearch
According to the docs, we have been using the wrong version. Since we're all using Elasticsearch > 1.0, we need to be using a 1.X.X version of the python elasticsearch wrapper.
Please
pip install -r requirements.txt
Switch to using bulk indexing
Previously, we were making a request for each document we were indexing. With this PR, we now use bulk indexing to do it in one go (to be exact, in larger chunks, but the same idea applies). Faster, more efficient.
We're also now using the
index
method instead of A) trying thecreate
method, then B) falling back on theupdate
method if the document is already there.index
automatically updates the document if it's already there.Review: @rosskarchner @kurtw