apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
887 stars 262 forks source link

Setting "maxDepth": 0 in urlfilter.json prevents ES seed injection #959

Closed orliac closed 2 years ago

orliac commented 2 years ago

With StormCrawler 2.3-SNAPSHOT, setting "maxDepth": 0 in the urlfilters.json prevents the seed injection into the ES index.

Expected behavior would be that the seeds would be injected and crawled with no redirection.

jnioche commented 2 years ago

thanks @orliac, should now be fixed. I simply removed the special rule for max depth 0