elastic / elasticsearch-ruby

Ruby integrations for Elasticsearch
https://www.elastic.co/guide/en/elasticsearch/client/ruby-api/current/index.html
Apache License 2.0
1.96k stars 596 forks source link

reindex method #68

Closed skv-headless closed 8 years ago

skv-headless commented 10 years ago

At this post http://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I read that some APIs provide a reindex() method. It could be great to have it in this gem.

karmi commented 10 years ago

Yeah, it is definitely planned for a future elasticsearch-dsl separate gem, which will be part of this repository.

karmi commented 10 years ago

Closing due to radio silence.

somebody32 commented 9 years ago

Can we reopen this issue, please? it will be great to have it without manually dancing with scroll and bulk apis

apple-corps commented 8 years ago

I wouldn't mind seeing the authors give an example of "dancing with scroll and bulk apis", just to compare against my own approach.

karmi commented 8 years ago

Well, the basic choreography for this dance is quite straightforward:

require 'elasticsearch'

client = Elasticsearch::Client.new

client.indices.delete index: 'test'
1_000.times do |i| client.index index: 'test', type: 'test', id: i+1, body: {title: "Test #{i}"} end
client.indices.refresh index: 'test'

r = client.search index: 'test', search_type: 'scan', scroll: '5m', size: 10

while r = client.scroll(scroll_id: r['_scroll_id'], scroll: '5m') and not r['hits']['hits'].empty? do
  print "--- BATCH #{defined?($i) ? $i += 1 : $i = 1}: "

  b = client.bulk index: 'test-new', type: 'test',
                  body: r['hits']['hits'].map { |d| { index: { _id: d['_id'], data: d['_source'] } } }
  print b['errors'] ? "ERR\n" : "OK\n"
end

Adding a reindex method with this code to the library is really easy. But as soon as you have this, you'll want to parallelize the operation somehow, for example, otherwise it probably wouldn't saturate Elasticsearch, and/or not be effective. You'll want to read from one cluster and index to another one.

Considerations like these prevented me from adding a "simple" reindex method to the library. I'm OK with leaving this issue open, or opening another one, but I'm just pointing out there's a potential for a lot of complexity.

karmi commented 8 years ago

Just a note here, that:

  1. Support for the core Elasticsearch's "Reindex" API has been added in 1cf5fb9
  2. I'm working on merging #270, which effectively brings reindexing to older Elasticsearch versions.

I'll leave a note here with pointer to final code/documentation when finished.

karmi commented 8 years ago

I've forgot to comment here when I added & refactored the Reindex extension, see information with example code here: https://github.com/elastic/elasticsearch-ruby/pull/270#issuecomment-220316848