add support for autogenerated elasticsearch ids

andrewthad commented 6 years ago

This has been suggested before, but in light of improvements made to ES5 and in light of my own experience working on a piece of software that has to store 30,000 documents per second, I wanted to bring it up again. My understanding is that there was originally an unsound autogenerated id optimization in ES1. It was removed in ES2 for obvious reasons. But then in the release notes for ES5, we see:

Optimize indexing for the autogenerated ID append-only case

I haven't dug into how exactly it works, but I have benchmarked (unfortunately on a proprietary application) the difference between automatically generating an ID and specifying it manually. I have observed that our cluster that does 30,000 documents per second with an auto id can only do 22,000 with a manual id. For this reason, we use auto ids. Since elasticsearch doesn't guarantee persistence, we still have to archive them elsewhere as well.

I agree that auto ids are often dangerous, but sometimes they are helpful for performance, and I would like to see them supported by bloodhound. I could add some pretty strong warnings in the haddocks if that would help ease your conscience.

bitemyapp commented 6 years ago

I'm leery, but open to being convinced.

This is the tracking issue on GitHub I believe: https://github.com/elastic/elasticsearch/issues/10708

bitemyapp commented 6 years ago

Please add a comment with a warning including a link to the tracking issue and a test-case. With that in place, I'll happily merge.

andrewthad commented 6 years ago

Will do. Thanks!

andrewthad commented 6 years ago

I've added tests and a warning in the docs.

bitemyapp commented 6 years ago

I've manually integrated this PR into a branch that is being incorporated into the internal reorganization, you can find it here: https://github.com/bitemyapp/bloodhound/tree/faster-compile-auto-id

I think it's done and ready to join the rest of the changes so I consider this PR closed. Thank you!

bitemyapp / bloodhound

add support for autogenerated elasticsearch ids #217