-
**Is your feature request related to a problem? Please describe.**
bots on the internet should honor the robots.txt (see [RFC 9309](https://datatracker.ietf.org/doc/rfc9309/)
**Describe the so…
-
Does it lie in the scope of this project to support robots.txt?
-
In https://github.com/bopoda/robots-txt-parser/blob/master/src/RobotsTxtParser/RobotsTxtValidator.php, there's a chunk of code at line 47:
``` // if has not allow rules we can determine when…
-
```
User-Agent: *
Allow: /mlist
Allow: /sitemap
Disallow: /
```
`$parser->isAllowed('/')` is true ?
on windows 10
php 7.2
version: 0.2.4
-
BPO | [13281](https://bugs.python.org/issue13281)
--- | :---
Nosy | @terryjreedy, @orsenthil, @ezio-melotti, @merwok, @akheron
Files | [robotparser.py.patch](https://bugs.python.org/file23538/robotpar…
-
**Explanation**
I have been seeing a lot of sitemap.xml issues with the various deployments of Pengin-pi. This is the second time Google has delisted us because of sitemap issues. I have now correc…
-
Hello,
It seems that there is an issue with external source. Using a proxy I cannot see any of these sources used (no google or virus total and no request to robots.txt). I have done several test w…
-
Is scrapy-splash not compatible with obeying robots.txt? Everytime I make a query it attempts to download the robots.txt from the docker instance of scrapy-splash. The below is my settings file. I'm t…
-
Create a queue of URLs to crawl from each page you visit. Should obey [robots.txt](https://github.com/eklem/crawler-in-browser/issues/9)
eklem updated
3 years ago
-
The robots.txt rules should survive restarts and be per-domain.
See http://www.robotstxt.org/robotstxt.html for some examples. I didn't find any standard java parsers onlien in a quick search, so may…