Open diadistis opened 9 years ago
Thanks for reporting this @diadistis, and sorry for the terrible response time. I've noticed similar, and I've done similar workarounds. I haven't had a chance to do profiling on the internal design to isolate the bottleneck, but I suspect at the very least the single LinkedBlockingQueue
that feeds the pipeline is part of it.
I did just push a fix for some extraneous string copying, but it won't speed anything up 8x. If you still have this environment available I'd love to know its effect.
Setup
Problem
I'm running :
I have tried several different options for
--bulk-bytes
,-w
,-d
and-q
but always the same result. I'm getting a constant indexing speed of ~5MB/s which translates to 4 hours to import the file. While indexing the elasticsearch cluster is heavily under-utilized and the stream2es server has a single core at 100%. I have done extensive testing to ensure that there are no network or elasticsearch performance issues.Workaround
My final solution was to run stream2es in parallel (not with
-w
) to see if that would help.That helped a lot. Now all 6 cores and 12 threads get 100% and the indexing time fell from 4 hours to 35 minutes but the elasticsearch cluster is still pretty much idle. It seems to me that something in stream2es uses way more cpu than it should.