elastic / stream2es

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
355 stars 62 forks source link

Parent/child relationship support #29

Closed Koka closed 10 years ago

Koka commented 10 years ago

I've noticed that current version doesn't support parent/child relationships while reindexing from one ES index to another. I think it might be important case, so - here is the patch that fixes that.

drewr commented 10 years ago

Thanks @Koka!

Analect commented 9 years ago

@Koka Would it be possible to give an example of the use-case of the parent-child implementation you added? If I have two fields in one index and I want to reindex using one of the fields as a parent_id, is that possible? Thanks.

Koka commented 9 years ago

@Analect My use-case was simple - my original ES has some parent-child mappings. When I've tried to reindex everything to another ES instance I've noticed that all my parent-child relationships were gone. Now stream2es maintains proper relationships while reindexing from one ES to another.

drewr commented 9 years ago

@Analect Most of the work for parent/child happens at query time. The only real consideration at index time is that the _parent field is retained so that ES can route the document to the proper shard (the one with the parent) when writing to the target index. @Koka's patch simply added that field to the pipeline.

When I originally wrote stream2es as a simple wikipedia river replacement, the es stream wasn't on my radar, and when I later implemented it out of a testing need, I had overlooked p/c. Then when users started using it to reindex much richer indices, the p/c omission became apparent. Thanks again @Koka for adding it!

Analect commented 9 years ago

Thanks @Koka for the clarification. Have you seen any other tool that helps in generating the parent/child relationships in the first place. I'm aware of the bulk API, but ideally I'm trying to feed from a mongodb collection, where one of the fields is used as a parent_id. I had hoped to use the 10genlab/mongo-connector, but it doesn't handle parent-child as yet. Also, I can't get the mongodb-river plugin method using scripting to work (example here: https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/413). Thanks.