-
This will mitigate some access issues from Phantom and not be hard-coded to Heritrix's UI elements.
https://webarchive.jira.com/wiki/display/Heritrix/Heritrix+3.x+API+Guide
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
यंत्रों को अवरोधित करने के
लिए कर्पसूची की सुविधा होनी
चाहिए।
```
Original issue reported on code.google.com by `alok.ku...@gmail.com` on 9 Oct 2009 at 6:29
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
Can be done programmatically by killing process labeled java but that's hacky, would require grepping in WAIL and is problematic.
-
Tested on OS X 10.10.2 dev build
-
```
=What steps will reproduce the problem?=
1. Crawl a small site.
2. Copy the crawl to a new job and Set the option of only_new_records to "true"
3. Crawl the site again
If you monitor the wire, th…
-
```
Hard coded versions in the pom will be replaced with variables that the
build operator can override using settings.xml file.
```
Original issue reported on code.google.com by `ryan.justin.smi…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…