Closed Koka closed 10 years ago
Thanks @Koka!
@Koka Would it be possible to give an example of the use-case of the parent-child implementation you added? If I have two fields in one index and I want to reindex using one of the fields as a parent_id, is that possible? Thanks.
@Analect My use-case was simple - my original ES has some parent-child mappings. When I've tried to reindex everything to another ES instance I've noticed that all my parent-child relationships were gone. Now stream2es maintains proper relationships while reindexing from one ES to another.
@Analect Most of the work for parent/child happens at query time. The only real consideration at index time is that the _parent
field is retained so that ES can route the document to the proper shard (the one with the parent) when writing to the target index. @Koka's patch simply added that field to the pipeline.
When I originally wrote stream2es as a simple wikipedia river replacement, the es
stream wasn't on my radar, and when I later implemented it out of a testing need, I had overlooked p/c. Then when users started using it to reindex much richer indices, the p/c omission became apparent. Thanks again @Koka for adding it!
Thanks @Koka for the clarification. Have you seen any other tool that helps in generating the parent/child relationships in the first place. I'm aware of the bulk API, but ideally I'm trying to feed from a mongodb collection, where one of the fields is used as a parent_id. I had hoped to use the 10genlab/mongo-connector, but it doesn't handle parent-child as yet. Also, I can't get the mongodb-river plugin method using scripting to work (example here: https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/413). Thanks.
I've noticed that current version doesn't support parent/child relationships while reindexing from one ES index to another. I think it might be important case, so - here is the patch that fixes that.