EvanHahn / cyborg.txt

robots.txt utilities for Node
https://www.npmjs.org/package/cyborg.txt
MIT License
5 stars 2 forks source link

Pattern-matching rules #3

Open sanderheilbron opened 9 years ago

sanderheilbron commented 9 years ago

@EvanHahn do you have plans to support pattern-matching rules (for web crawlers) like:

More information about pattern-matching rules is available here: https://support.google.com/webmasters/answer/6062596?hl=en&ref_topic=6061961 https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

EvanHahn commented 9 years ago

Is this a feature for webmasters or for authors of webcrawlers?

sanderheilbron commented 9 years ago

Both, for webcrawlers to check what kind of URL patterns they are not allowed to crawl. And for webmaster when they have to generate these patterns inside robots.txt.

URL patterns are supported by search engines like Google, Bing and Yandex.

EvanHahn commented 9 years ago

I think webmasters already get this support; they can just add those special characters.

For webcrawlers, would you like a regular expression that can parse these?

sanderheilbron commented 9 years ago

It would be nice when the current allows and disallows methods support these rules. Is that possible?

EvanHahn commented 9 years ago

I'm terribly busy at the moment. Would you be able to submit a pull request? Feel free to rewrite as much of the code as you like.

If not, I'll get to it eventually but it won't happen too soon.

sanderheilbron commented 9 years ago

Time is also an issue for me. When I find some time, I will dive into it.

EvanHahn commented 9 years ago

Thank you!

EvanHahn commented 8 years ago

I've stopped maintaining this library. See issue #4 for further discussion on this.