Closed pipaltree closed 3 years ago
This occurred because the robots.txt
is redirecting to robots.txt/
. Since it is expected at a standard location, it tried to read it from robots.txt
(no slash) and failed since it is not returning expected content. I just deployed a new 2.9.1-SNAPSHOT version that will follow the redirect. Please try and confirm.
Thanks for your reply and for providing the snapshot! At the moment I have plenty of work under my hands but I will try as soon as possible and give you feedback.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
On a site with sitemap path specified in robots.txt Norconex doesn't recognize this specification. The SitemapResolverFactory is configured to respect only specifications from robots.txt by setting the empty
path
tag:Content of the robots.txt:
I have enabled log level DEBUG for the collector in log4j.properties:
log4j.logger.com.norconex.collector.http=DEBUG
But if I start crawling, the logging says "No sitemap paths specified.":
What could be the issue here?