evanderkoogh / node-sitemap-stream-parser

A streaming parser for sitemap files. Is able to deal with deeply nested sitemaps with 100+ million urls in them.
Apache License 2.0
38 stars 18 forks source link

[IMP]: Respectation of robots.txt #15

Open YarnSeemannsgarn opened 6 years ago

YarnSeemannsgarn commented 6 years ago

I found an example https://booking.com/robots.txt where sitemaps are marked as Disallowed

Sitemap: https://www.booking.com/sitembk-index-https.xml`

User-agent: Baiduspider
Disallow: /sitembk-index-https.xml

I suggest to add an option respectRobotsTxt to the parser which is true by default.

evanderkoogh commented 6 years ago

That seems fair.. Will also make the user-agent configurable at the same time. Thanks for the suggestion!