Closed skv-headless closed 8 years ago
Yeah, it is definitely planned for a future elasticsearch-dsl
separate gem, which will be part of this repository.
Closing due to radio silence.
Can we reopen this issue, please? it will be great to have it without manually dancing with scroll and bulk apis
I wouldn't mind seeing the authors give an example of "dancing with scroll and bulk apis", just to compare against my own approach.
Well, the basic choreography for this dance is quite straightforward:
require 'elasticsearch'
client = Elasticsearch::Client.new
client.indices.delete index: 'test'
1_000.times do |i| client.index index: 'test', type: 'test', id: i+1, body: {title: "Test #{i}"} end
client.indices.refresh index: 'test'
r = client.search index: 'test', search_type: 'scan', scroll: '5m', size: 10
while r = client.scroll(scroll_id: r['_scroll_id'], scroll: '5m') and not r['hits']['hits'].empty? do
print "--- BATCH #{defined?($i) ? $i += 1 : $i = 1}: "
b = client.bulk index: 'test-new', type: 'test',
body: r['hits']['hits'].map { |d| { index: { _id: d['_id'], data: d['_source'] } } }
print b['errors'] ? "ERR\n" : "OK\n"
end
Adding a reindex
method with this code to the library is really easy. But as soon as you have this, you'll want to parallelize the operation somehow, for example, otherwise it probably wouldn't saturate Elasticsearch, and/or not be effective. You'll want to read from one cluster and index to another one.
Considerations like these prevented me from adding a "simple" reindex method to the library. I'm OK with leaving this issue open, or opening another one, but I'm just pointing out there's a potential for a lot of complexity.
Just a note here, that:
I'll leave a note here with pointer to final code/documentation when finished.
I've forgot to comment here when I added & refactored the Reindex extension, see information with example code here: https://github.com/elastic/elasticsearch-ruby/pull/270#issuecomment-220316848
At this post http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I read that some APIs provide a reindex() method. It could be great to have it in this gem.