-
Apache Storm used for Stormcrawler is version 1.2 but Apache Strom latest version is >=2.
Note that Stormcrawler v2.0 uses Storm v2.x. (see v1.6 ... v2.0 [comparison](https://github.com/DigitalPebb…
-
Does not happen in remote mode, only when running in local
```
14:32:01.031 [Thread-49-tika-executor[24, 24]] ERROR o.a.s.u.Utils - Async loop died!
java.lang.VerifyError: Stack map does not ma…
-
the current API doesn't set hosts. This field is necessary since the stormcrawler (linkchecker) is parallelizing the url calls by host.
Therefore a host must be set for every link to check
-
We have been using stormcrawler with elasticsearch under huge load using parallelism across multiple workers for some time without any issues. After we upgraded the version from 1.18 to 2.1 (storm 1.2…
-
### Performance degradation using version 2.1.0
We have been experiencing performance degradation when using the new version of stormcrawler.
We are pretty sure that the issue is being caused by thi…
-
Hi @jnioche!
Currently we are using a custom protocol when fetching that does follow redirections. So far so good until we found a case where the redirection is being done by meta refresh tag in the…
-
When using the HybridSpout I get cannot cast ArrayList to String exceptions when using a field from metadata as the key.
The error is in the following line:
https://github.com/DigitalPebble/stor…
-
I'm not sure whether the following line is intentional.
https://github.com/DigitalPebble/storm-crawler/blob/eb735b4f704e86ccdf004ca4dee107f331fad2a4/core/src/main/java/com/digitalpebble/stormcrawl…
-
The parameter `partitionField` in the `HybridSpout` can only be the same field as the one used in the `URLPartitioner`.
The `emptyQueue()` method in the `HybridSpout` filters on the `queueName` mac…
-
19633413 [Thread-24-parse-executor[7 7]] INFO c.d.s.b.JSoupParserBolt - Parsing : starting https://indicator.natwest.com:443/
19633417 [Thread-24-parse-executor[7 7]] ERROR o.a.s.util - Async loop d…