elastic / stream2es

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
355 stars 60 forks source link

Out of memory error with attachments #24

Closed arowla closed 10 years ago

arowla commented 10 years ago

I saw closed issue #15 about an out of memory error, but I just built stream2es from master, so this seems to be a different issue. I am using ES with the mapper attachments plugin. If I transfer my document type that doesn't contain attachments, stream2es does fine, but if I try to stream the document type which does contain attachments, or the whole index, I get an out of memory error, as seen below. I am using ES 1.1.0.

 $./stream2es- es --source http://localhost:9200/fbopen/opp_attachment --target http://localhost:9200/fbopen2/opp_attachment
stream es from http://localhost:9200/fbopen/opp_attachment to http://localhost:9200/fbopen2/opp_attachment
clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:trace-redirects ["http://localhost:9200/_search/scroll"], :request-time 5074, :status 500, :headers {"content-type" "application/json; charset=UTF-8", "content-length" "58"}, :body "{\"error\":\"OutOfMemoryError[Java heap space]\",\"status\":500}"}, :environment {client #<client$wrap_output_coercion$fn__986 clj_http.client$wrap_output_coercion$fn__986@62897819>, req {:request-method :get, :url "http://localhost:9200/_search/scroll", :body "c2Nhbjs1OzYzMTY6V3hlZVcydC1TN3FmWnhJazlBZXdOZzs2MzE1Old4ZWVXMnQtUzdxZlp4SWs5QWV3Tmc7NjMxODpXeGVlVzJ0LVM3cWZaeElrOUFld05nOzYzMTk6V3hlZVcydC1TN3FmWnhJazlBZXdOZzs2MzE3Old4ZWVXMnQtUzdxZlp4SWs5QWV3Tmc7MTt0b3RhbF9oaXRzOjY1NTs=", :query-params {:scroll "15s"}}, map__908 {:trace-redirects ["http://localhost:9200/_search/scroll"], :request-time 5074, :status 500, :headers {"content-type" "application/json; charset=UTF-8", "content-length" "58"}, :body "{\"error\":\"OutOfMemoryError[Java heap space]\",\"status\":500}"}, resp {:trace-redirects ["http://localhost:9200/_search/scroll"], :request-time 5074, :status 500, :headers {"content-type" "application/json; charset=UTF-8", "content-length" "58"}, :body "{\"error\":\"OutOfMemoryError[Java heap space]\",\"status\":500}"}, status 500}}
        at clj_http.client$wrap_exceptions$fn__907.invoke(client.clj:111)
        at clj_http.client$wrap_accept$fn__1027.invoke(client.clj:380)
        at clj_http.client$wrap_accept_encoding$fn__1033.invoke(client.clj:394)
        at clj_http.client$wrap_content_type$fn__1022.invoke(client.clj:370)
        at clj_http.client$wrap_form_params$fn__1071.invoke(client.clj:481)
        at clj_http.client$wrap_nested_params$fn__1089.invoke(client.clj:505)
        at clj_http.client$wrap_method$fn__1066.invoke(client.clj:464)
        at clj_http.cookies$wrap_cookies$fn__518.invoke(cookies.clj:118)
        at clj_http.links$wrap_links$fn__548.invoke(links.clj:50)
        at clj_http.client$wrap_unknown_host$fn__1098.invoke(client.clj:524)
        at clj_http.client$get.doInvoke(client.clj:615)
        at clojure.lang.RestFn.invoke(RestFn.java:423)
        at stream2es.es$scroll_STAR_.invoke(es.clj:64)
        at stream2es.es$scroll.invoke(es.clj:72)
        at stream2es.es$scan.invoke(es.clj:96)
        at stream2es.stream.es$make_callback$fn__1771.invoke(es.clj:69)
        at stream2es.main$stream_BANG_.invoke(main.clj:299)
        at stream2es.main$main.invoke(main.clj:425)
        at stream2es.main$_main.doInvoke(main.clj:446)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at stream2es.main.main(Unknown Source)
stream error: clojure.lang.ExceptionInfo: clj-http: status 500 {:object {:trace-redirects ["http://localhost:9200/_search/scroll"], :request-time 5074, :status 500, :headers {"content-type" "application/json; charset=UTF-8", "content-length" "58"}, :body "{\"error\":\"OutOfMemoryError[Java heap space]\",\"status\":500}"}, :environment {client #<client$wrap_output_coercion$fn__986 clj_http.client$wrap_output_coercion$fn__986@62897819>, req {:request-method :get, :url "http://localhost:9200/_search/scroll", :body "c2Nhbjs1OzYzMTY6V3hlZVcydC1TN3FmWnhJazlBZXdOZzs2MzE1Old4ZWVXMnQtUzdxZlp4SWs5QWV3Tmc7NjMxODpXeGVlVzJ0LVM3cWZaeElrOUFld05nOzYzMTk6V3hlZVcydC1TN3FmWnhJazlBZXdOZzs2MzE3Old4ZWVXMnQtUzdxZlp4SWs5QWV3Tmc7MTt0b3RhbF9oaXRzOjY1NTs=", :query-params {:scroll "15s"}}, map__908 {:trace-redirects ["http://localhost:9200/_search/scroll"], :request-time 5074, :status 500, :headers {"content-type" "application/json; charset=UTF-8", "content-length" "58"}, :body "{\"error\":\"OutOfMemoryError[Java heap space]\",\"status\":500}"}, resp {:trace-redirects ["http://localhost:9200/_search/scroll"], :request-time 5074, :status 500, :headers {"content-type" "application/json; charset=UTF-8", "content-length" "58"}, :body "{\"error\":\"OutOfMemoryError[Java heap space]\",\"status\":500}"}, status 500}}
streamed 0 indexed 0 bytes xfer 0 errors null
drewr commented 10 years ago

Interesting @arowla! I just saw this. If you can give me a script which indexes a little bit of data that can reliably cause the issue, that would greatly help. Before you do that, however, have you tried setting --scroll-size to something smaller, like 10? The default is 500 which may be too big if your docs are large.

vasergen commented 10 years ago

I had the same issue as @arowla . I decreased scroll-size parameter and it helped me.

drewr commented 10 years ago

Thanks @vasergen. FYI, @arowla, you can adjust these to help here:

% stream2es es --help | fgrep scroll
   --scroll-size Source scroll size (default: 500)
   --scroll-time Source scroll context TTL (default: "15s")