robots-txt-parser Search Results

1000+ results
for robots-txt-parser

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

EdJoPaTo/website-stalker #183

Check robots.txt

**Is your feature request related to a problem? Please describe.** bots on the internet should honor the robots.txt (see [RFC 9309](https://datatracker.ietf.org/doc/rfc9309/) **Describe the so…

EdJoPaTo updated 1 year ago
2
bda-research/node-crawler #35

robots.txt awareness

Does it lie in the scope of this project to support robots.txt?

reezer updated 6 years ago
9
bopoda/robots-txt-parser #63

Wrong boolean value in RobotsTxtValidator.php?

In https://github.com/bopoda/robots-txt-parser/blob/master/src/RobotsTxtParser/RobotsTxtValidator.php, there's a chunk of code at line 47: ``` // if has not allow rules we can determine when…

scott8035 updated 6 months ago
1
t1gor/Robots.txt-Parser-Class #84

parse bug

``` User-Agent: * Allow: /mlist Allow: /sitemap Disallow: / ``` `$parser->isAllowed('/')` is true ? on windows 10 php 7.2 version: 0.2.4

imsheng updated 2 years ago
1
python/cpython #57490

Make robotparser.RobotFileParser ignore blank lines

BPO | [13281](https://bugs.python.org/issue13281) --- | :--- Nosy | @terryjreedy, @orsenthil, @ezio-melotti, @merwok, @akheron Files | [robotparser.py.patch](https://bugs.python.org/file23538/robotpar…

ca95d767-ae5c-4a0b-86df-77d55b8f66b2 updated 5 months ago
13
Pengin-Open-Source/pengin-pi #317

[FEATURE] sitemap generator

**Explanation** I have been seeing a lot of sitemap.xml issues with the various deployments of Pengin-pi. This is the second time Google has delisted us because of sitemap issues. I have now correc…

stuart909 updated 3 months ago
1
Nekmo/dirhunt #70

eternals sources are not used

Hello, It seems that there is an issue with external source. Using a proxy I cannot see any of these sources used (no google or virus total and no request to robots.txt). I have done several test w…

ogma-sec updated 5 years ago
4
scrapy-plugins/scrapy-splash #180

Obey Robots.txt

Is scrapy-splash not compatible with obeying robots.txt? Everytime I make a query it attempts to download the robots.txt from the docker instance of scrapy-splash. The below is my settings file. I'm t…

JohnMTrimbleIII updated 3 years ago
9
eklem/browsercrawler #10

Gather URLs to crawl

Create a queue of URLs to crawl from each page you visit. Should obey [robots.txt](https://github.com/eklem/crawler-in-browser/issues/9)

eklem updated 3 years ago
2
ScottMansfield/widow #2

Add support for robots.txt for any website

The robots.txt rules should survive restarts and be per-domain. See http://www.robotstxt.org/robotstxt.html for some examples. I didn't find any standard java parsers onlien in a quick search, so may…

ScottMansfield updated 9 years ago
3

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for robots-txt-parser

1000+ results
for robots-txt-parser