robots-txt-parser Search Results

1000+ results
for robots-txt-parser

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

brluzf/googlesitemapgenerator #89

Output sitemap file has no content

``` What steps will reproduce the problem? 1. Install Google SiteMap Generator on Windows2003 server. 2. Configure a URL for which you want to generate site map. What is the expected output? What do…

GoogleCodeExporter updated 8 years ago
4
agunghadinoto/googlesitemapgenerator #89

Output sitemap file has no content

``` What steps will reproduce the problem? 1. Install Google SiteMap Generator on Windows2003 server. 2. Configure a URL for which you want to generate site map. What is the expected output? What do…

GoogleCodeExporter updated 8 years ago
4
fritexvz/googlesitemapgenerator #89

Output sitemap file has no content

``` What steps will reproduce the problem? 1. Install Google SiteMap Generator on Windows2003 server. 2. Configure a URL for which you want to generate site map. What is the expected output? What do…

GoogleCodeExporter updated 8 years ago
4
andreburgaud/robotspy #209

Not working disallow rule

robotspy==0.8.0 ```python import robots content = """ User-agent: mozilla/5 Disallow: / """ check_url = "https://example.com" user_agent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWe…

kox-solid updated 2 weeks ago
9
lidoapps/crawler4j #58

Crawler ignores Crawl-delay from the host's robots.txt

``` What steps will reproduce the problem? 1. Find a website where robots.txt has something similar to User-agent: * Crawl-delay: 80 2. Run the crawler with a parser What is the expected output? What…

GoogleCodeExporter updated 9 years ago
7
xxqcheers/crawler4j #58

Crawler ignores Crawl-delay from the host's robots.txt

``` What steps will reproduce the problem? 1. Find a website where robots.txt has something similar to User-agent: * Crawl-delay: 80 2. Run the crawler with a parser What is the expected output? What…

GoogleCodeExporter updated 9 years ago
7
muratgozel/robotstxt-util #2

Buggy handling of additional information

When a robots.txt file contains regular rules and sitemaps, everything works fine. However, issues arise when: 1. The robots.txt file contains `#` comments (1) if the comment appears somewhere in th…

bart-turczynski updated 1 day ago
3
yasserg/crawler4j #205

How to debug with this url while using crawler4j?

I am trying to crawl data from this website: [http://www.companys.com.tw/](url). I can get full html code from other websites, but I get totally empty content from this url when my program run `page.…

Jason-CCS updated 7 years ago
1
divkakwani/webcorpus #7

Handle recursive sitemaps

There are some sitemaps which recursively contains sitemaps. For instance: https://www.dailythanthi.com/Sitemap/Sitemap.xml But the recursive sitemaps may or may not comply to the sitemap format. …

GokulNC updated 4 years ago
4
yuhulian/crawler4j #58

Crawler ignores Crawl-delay from the host's robots.txt

``` What steps will reproduce the problem? 1. Find a website where robots.txt has something similar to User-agent: * Crawl-delay: 80 2. Run the crawler with a parser What is the expected output? What…

GoogleCodeExporter updated 9 years ago
7

上一页 1...7 8 9 10 11 12 13...100 下一页

1000+ results for robots-txt-parser

1000+ results
for robots-txt-parser