Open reezer opened 11 years ago
Should be optional but yes I think so!
We should add a dependency on https://github.com/ekalinin/robots.js ?
Maybe. Just a side node: Crawlers could also use a Sitemap and currently the robots.js doesn't parse them.
Hi @sylvinus, I know that this issue is very old already, but is there any support of robots.txt files and/or sitemaps yet?
Hi!
Sorry about that, but I'm not the maintainer anymore.
Best,
On Sat, Jun 17, 2017 at 9:35 PM koolma notifications@github.com wrote:
Hi @sylvinus https://github.com/sylvinus, I know that this issue is very old already, but is there any support of robots.txt files and/or sitemaps yet?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bda-research/node-crawler/issues/35#issuecomment-309235524, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZDVKqFODIydeJHvn58PbekQT6BX3sLks5sFCqQgaJpZM4ATbxS .
--
-- Sylvain Zimmer
blog: sylvinus.org mobile: +33 6 64 67 61 71
@koolma Can you provide more details?
@mike442144 I would like to know if crawlers implemented with this project respect the robots.txt files on the servers crawled, respectivley if they make use of a site map to discover URLs.
Not yet, for the crawler module is not that kind of spider to fetch web pages for search engine use. However, I think it could be another option that visit web pages respect the robots.txt. I don't have much free time now, go ahead to add this feature if you need. We could have further more discussions if any problems.What do you think of it?
This one a beast...
Does it lie in the scope of this project to support robots.txt?