robots-txt-parser Search Results

1000+ results
for robots-txt-parser

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

t1gor/Robots.txt-Parser-Class #69

Relative paths is always allowed

``` User-agent: * Disallow: / ``` ``` $this->assertTrue($parser->isDisallowed("&&1@|")); $this->assertFalse($parser->isAllowed('+£€@@1¤')); ``` The two tests above fails, paths allowed according to …

JanPetterMG updated 7 years ago
1
rivermont/spidy #89

Update for Python 3.9+, deprecate Reppy dependency

Reppy doesn't work past Python 3.8 - seomoz/reppy#122, seomoz/reppy#132 - which means our robots.txt parser isn't working (#81). Python 3.8 also reaches end-of-life next year so this needs to happen …

rivermont updated 9 months ago
2
webignition/robots-txt-file #14

Missing User Agent Gives Error

I had a robots.txt file to process which included the following line, which caused a fatal error: User-agent: No user agent was specified, and the robots.txt parser errored when checking a U…

DaveChild updated 6 years ago
5
evanderkoogh/node-sitemap-stream-parser #15

[IMP]: Respectation of robots.txt

I found an example [https://booking.com/robots.txt](https://booking.com/robots.txt) where sitemaps are marked as **Disallowed** ``` Sitemap: https://www.booking.com/sitembk-index-https.xml` Use…

YarnSeemannsgarn updated 6 years ago
1
python/cpython #114310

urllib.robotparser doesn't treat the "*" path correctly

# Bug report ### Bug description: https://github.com/python/cpython/blob/3.12/Lib/urllib/robotparser.py#L227 `self.path == "*"` will never be `true` because of this line: https://github.com/python…

tognee updated 5 months ago
9
crawler-commons/crawler-commons #197

Create separate core and tools modules

Hello, are there any chances to split the current module in two separate modules ? One for robots.txt and one for sitemap.xml ? I am co-maintain the [crawler4j](https://github.com/yasserg/crawle…

s17t updated 6 years ago
11
crawler-commons/crawler-commons #88

clean-param directives in robots.txt

See [https://yandex.com/support/webmaster/controlling-robot/robots-txt.xml#clean-param]. Not sure it is part of the standard spec but seems to be used, for example [http://fishki.net/robots.txt].

jnioche updated 2 years ago
3
scrapy/scrapy #892

Crawl-Delay support for robots.txt

[Crawl-Delay](http://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive) directive in robots.txt looks useful. If it is present the delay suggested there looks like a good way to ad…

kmike updated 1 year ago
8
crawler-commons/crawler-commons #123

Test parsing of robots files with CC dataset

CommonCrawl have released a dataset containing robots.txt files - [http://commoncrawl.org/2016/09/robotstxt-and-404-redirect-data-sets/] This could be used to test our parsing code. CC @sebastian-na…

jnioche updated 1 year ago
2
t1gor/Robots.txt-Parser-Class #67

Prettify function

It would be nice to output a nice and valid robots.txt.

szepeviktor updated 6 years ago
29

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for robots-txt-parser

1000+ results
for robots-txt-parser