elastic / stream2es

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
355 stars 60 forks source link

Tuning stream2es Guidance #48

Closed packplusplus closed 9 years ago

packplusplus commented 9 years ago

I'm looking for guidance on how to "tune" stream2es. I'm using it to copy indexes into new mappings. I have no idea how many workers to be running, or if the default values are the bottleneck or if my elasticsearch is the bottleneck. I assume the default values are slowing me down, because the copy rate doesn't seem to change if I have multiple copies going and i don't see any real cpu / io pressure on the cluster. It could also be running it thru an elb, and it could also be the system that's doing the copy (t2.large in aws).

Do you have any guidance on what methodology to use other than "try a couple more workers until you don't see any improvements"?

drewr commented 9 years ago

Sorry I didn't respond here. Did you figure something out?

packplusplus commented 9 years ago

Not particularly. I had been using stream2es to copy indexes for new mappings. I ran into some weirdness where copying would toss out ingestion errors, but ingesting new data with the same mappings wouldn't. I wouldn't exactly say I gave up, but es support said I may have less trouble using logstash to copy instead of stream2es. Didn't want to keep cluttering up your queue for a tool that I've put back on the shelf for a bit.

p.s. when stream2es was slow for me was copying indexes over 40-50 gigs, when it was for smaller indexes I had awesome luck with it, so thanks for writing it.