-
```
What steps will reproduce the problem?
1. try giving arc2warc.sh an input directory (-d ) that contains
'arc' - e.g. arc2warc.sh -d /home/crawler/heritrix-1.12.1/jobs/test/arcs/
What is the expec…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
```
SRS 41 — A command line tool “arc2warc” incorporating libwarc shall be
able
to migrate data in ARC-records to WARC-record format.
```
Original issue reported on code.google.com by `gordon.p...@g…
-
```
SRS 31 — A command line tool shall be implemented utilising libwarc to
check the consistency of WARC-records and their conformance to the WARC ISO
standard.
```
Original issue reported on code.go…
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…
-
```
What steps will reproduce the problem?
1.put your robots.txt to http://localhost/robots.txt with these lines:
User-agent: *
Disallow: /
2.crawl some page of localhost
3.you will get the contents…
-
```
What steps will reproduce the problem?
1. if the urls contain '\'
example:
http://www.lngs.gov.cn/newFormsFolders\LNGS_FORMS_633800715869843750XQJ.doc
the browser can recognizes the url
What i…