sometimes when the crawl is finishing and we only have few URLs pending, the nextTuple() in the aggregation spout is being called steadily (totally expected). If you have the property es.status.concurrentRequests in a number greater than 1 and your property spout.min.delay.queries is too low, you may get this error
java.io.IOException: Unable to parse response body for Response{requestLine=POST /status*/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&preference=_shards%3A8&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true HTTP/1.1, host=https://elasticsearch-coordinating:9200, response=HTTP/1.1 200 OK}
at org.elasticsearch.client.RestHighLevelClient$1.onSuccess(RestHighLevelClient.java:1665) [stormjar.jar:?]
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onSuccess(RestClient.java:590) [stormjar.jar:?]
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:333) [stormjar.jar:?]
at org.elasticsearch.client.RestClient$1.completed(RestClient.java:327) [stormjar.jar:?]
at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:122) [stormjar.jar:?]
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:181) [stormjar.jar:?]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:448) [stormjar.jar:?]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:338) [stormjar.jar:?]
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) [stormjar.jar:?]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) [stormjar.jar:?]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:121) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [stormjar.jar:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) [stormjar.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
Caused by: java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_252]
at java.lang.Long.parseLong(Long.java:601) ~[?:1.8.0_252]
at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_252]
at java.text.DigitList.getLong(DigitList.java:195) ~[?:1.8.0_252]
at java.text.DecimalFormat.parse(DecimalFormat.java:2084) ~[?:1.8.0_252]
at java.text.SimpleDateFormat.subParse(SimpleDateFormat.java:1869) ~[?:1.8.0_252]
at java.text.SimpleDateFormat.parse(SimpleDateFormat.java:1514) ~[?:1.8.0_252]
at java.text.DateFormat.parse(DateFormat.java:364) ~[?:1.8.0_252]
at com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout.onResponse(AggregationSpout.java:258) ~[stormjar.jar:?]
at com.digitalpebble.stormcrawler.elasticsearch.persistence.AggregationSpout.onResponse(AggregationSpout.java:71)
After some reserch we realized that this error is happening because 2 responses are trying to use the SDF at same time. We tried reducing thees.status.concurrentRequests to 1 and increase spout.min.delay.queries and the error has gone.
If you want we can include a fix for this, we have 2 options:
sometimes when the crawl is finishing and we only have few URLs pending, the
nextTuple()
in the aggregation spout is being called steadily (totally expected). If you have the propertyes.status.concurrentRequests
in a number greater than 1 and your propertyspout.min.delay.queries
is too low, you may get this errorAfter some reserch we realized that this error is happening because 2 responses are trying to use the SDF at same time. We tried reducing the
es.status.concurrentRequests
to 1 and increasespout.min.delay.queries
and the error has gone. If you want we can include a fix for this, we have 2 options:Extra information