Crawl-delay in robots.txt should not shrink delay configured by fetcher.server.delay

The news crawler is configured to be polite with a guaranteed fetch delay of few seconds. However, some robots.txt rules define a crawl-delay below one second which then overwrites the the configured delay. The crawler-commons robots.txt parser would allow even a delay of only 1 ms, in practice I've seen a crawl-delay of 200 ms. To keep the control a longer configured delay should take the precedence.

Note: Yandex' robots.txt specs allow fraction numbers for crawl-delay. Examples: bin.ua, vladnews.ru, gov.uk.

commoncrawl / news-crawl

Crawl-delay in robots.txt should not shrink delay configured by fetcher.server.delay #24