Closed MyraBaba closed 7 years ago
Hi
I am afraid not. These values are taken from the file only and need recompiling and restarting the topology
See [https://github.com/DigitalPebble/storm-crawler/wiki/Configuration] and the default values [https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/resources/crawler-default.yaml]
I replied too quickly (blame Christmas) - if you look at [https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/java/com/digitalpebble/stormcrawler/filtering/URLFilter.java#L40] you'll see that the URL filters get the configuration - which will be overridden by any key values passed on the command line. This means that in theory the conf set in the JSON file could be overridden by the config. The trouble is that the filters do not necessarily implement it, e.g. [https://github.com/DigitalPebble/storm-crawler/blob/master/core/src/main/java/com/digitalpebble/stormcrawler/filtering/depth/MaxDepthFilter.java].
This could be implemented of course, or you can extend a variant of the filters provided to handle that.
thanks.. will look into it and let you know.
Hi,
We started to test stormcrawler. We are coming from the WW I (Nutch :).)
1 - is there any way to give command line paramethers such as:
maxdepth checkValidURI etc. which is normally located at the resources folder. If we change it from the file and compile again ( big jar) it is working. Is there any solution that doenst need recompiling ?
2 - Where we can find the full parameters list that we can use to configure all aspects of the crawler
thx