elastic / stream2es

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
355 stars 60 forks source link

import unreliable due to random loss of data #61

Closed andrekostolany closed 8 years ago

andrekostolany commented 8 years ago

this is a real showstopper and may hang together with https://github.com/elastic/stream2es/pull/57 !? Can reproduce it with a file of about 30 Mio Json documents:

gunzip -c /media/sf_common/something.json.gz |stream2es stdin --replace --target "http://localhost:9200/some/thing"

no error messages, but sometimes only 3 Mio documents are imported, another time it imports 7 Mio documents before it finishes. Nobody wants to fix this? It's rather unusable in this state!

drewr commented 8 years ago

It's likely related, although I'd be surprised if gunzip was introducing much, if any, latency. I'm trying to merge #57 to master but having to update a few things along the way. Stay tuned.

drewr commented 8 years ago

2016050914502743bc58f should have this addressed, along with a perf bump.

drewr commented 8 years ago

@andrekostolany can you try a couple more runs?

andrekostolany commented 8 years ago

Am 09.05.2016 um 22:21 schrieb Drew Raines:

@andrekostolanyhttps://github.com/andrekostolany can you try a couple more runs?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/elastic/stream2es/issues/61#issuecomment-217977728

I've started the load. I will inform you about the result later - it may be run a couple of hours...

Thank you so far!

andrekostolany commented 8 years ago

Hi Drew,

looks great 8-) Thank you so far!

bash-4.2$ gunzip -c /media/sf_common/work.json.gz |wc -l 29153263 bash-4.2$ gunzip -c /media/sf_common/work.json.gz |stream2es stdin --replace --target "http://localhost:9200/work/ice" 2016-05-10T08:24:08.002+0000 INFO delete index http://localhost:9200/work 2016-05-10T10:49:41.060+0000 INFO 145:33,033 3338,3d/s 1484,1K/s (12656,7mb) indexed 29153263 streamed 29153263 errors 0 2016-05-10T10:49:41.062+0000 INFO done bash-4.2$ bash-4.2$ curl -XGET "http://localhost:9200/work/_count" {"count":29153263,"_shards":{"total":2,"successful":2,"failed":0}}bash-4.2$