-
https://en.wikipedia.org/wiki/Robots_exclusion_standard
Any request that isnt allowed by robots.txt should be reported as such.
-
Hi,
I'm using `pup v0.4.0`
I cannot select two different attributes using `attr{}` :
Selecting the `title` attribute of the `link[type="application/x-wiki"]` element :
```css
$ curl -qs htt…
sebma updated
2 years ago
-
https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive
-
Example HTML:
```
Robots exclusion standard
date: xyz
```
Now it would be great if pup could be used in a way:
- iterate over all `h1`
- for each h1, print out `Robots exclusion standa…
-
DSNP should support a flag as part of the user profile to indicate a specific field should not be indexed for the public, therefore not searchable.
This would be similar to https://en.wikipedia.org…
-
While translating [Decentralized Identifiers (DIDs) v1.0](https://www.w3.org/TR/did-core/) into Korean, @lukasjhan spotted several [broken links](https://validator.w3.org/checklink?uri=https%3A%2F%2Fw…
-
see
https://github.com/gjtorikian/robotstxt-parser
-
Base on the politeness, it should supporting robots.txt.
-
to control crawlers who are getting deeper into the database than desirable - including the curator activity data which is a known bug
-
Any plans for robots.txt compliance?