-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
Heritrix 3.0.0 was released in December of 2009. It would be nice to
support the new 3.0.0 version.
```
Original issue reported on code.google.com by `ryan.justin.smith@gmail.com` on 11 Jan 20…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
```
What steps will reproduce the problem?
1. Configure a brand new crawl in Heritrix on a new table and set
'only_new_records' to 'true'
What is the expected output? What do you see instead?
Heritri…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
What steps will reproduce the problem?
1. Using hbase-writer with Heritrix on large crawlers over several hours.
2. The process will hang after too many HTable instances are created
What is the e…
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…