-
Wish: index documents with parent_child schema:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-parent-field.html
I tried to implement this using 1.5-SNAPSHOT.
I created t…
-
This would provide a very basic mechanism for backups as well as a simple way to load alternative storage backends.
-
I created new StormCrawler-based project by following steps described in [I](https://github.com/DigitalPebble/storm-crawler). Without modifying crawler-confer.yaml, I executed CrawlTopology.java whic…
-
This should allow us to deal with the dynamic content. See discussion #142
Ideally we'd want to be able to have actions/navigations either programmatically or via configuration.
We could use :
- [se…
-
Conceptually it is not the same as an ERROR status - which occur for instance when a document is not parsable or has something wrong with it.
A document could become GONE if it has had N consecutive F…
-
removed comments to streamline issues for current discussion
-
In our crawl test we found that some of the urls didnt fully encoded for fetch. We have below errors.
I assume is coming from '%' .
FYI
FetcherBolt [ERROR] Exception while fetching http://www.h…
-
Connection with elasticsearch not working if cluster name different than elasticsearch.
The parameter cluster.name: "myclustername" (ex : es-stormcrawler) has no effect.
It's working only if your cl…
-
Hi,
We used storm crawler for testing in January and that time we used through creating uber jar . Flux . not configured than.
last day we updated storm crawler and its changed to to flux an…
-
Hi,
We started to test stormcrawler. We are coming from the WW I (Nutch :).)
1 - is there any way to give command line paramethers such as:
maxdepth
checkValidURI etc. which is normally lo…