-
Waiting to create these tests until that function is updated with the change to using a CSV as the warc_download.py script because this function will change radically then. It would be hard to test no…
-
*What I wanted:* An available HTTP resource to be downloaded, as can be done in the browser.
*What I expect:* The resource will be downloaded.
*What happened:*
This following shows up repe…
-
## Describe the bug
Pywb is throwing a LiveResourceException when receiving a self-redirect (3xx) from OutbackCDX. This results in Pywb displaying a blank page with the text "Not found".
## Steps …
-
```
make python fails with:
(...at least on x86_64...)
--- 8< ---
/usr/bin/ld: lib/private/plugin/python/warc_wrap.o: relocation R_X86_64_32S
against `.rodata' can not be used when making a shared ob…
-
```
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Warc-performSelector-leaks"
code....
#pragma clang diagnostic pop
```
忽略performSelector警告
-
If a post is reblogged on another blog, the image is also on that blog. This means the image will be downloaded twice for each blog, which eventually leads to a lot of duplication in the crawl. This i…
-
*What I wanted/expected:* Cookies, read from the provided cookies.txt, to be used during crawls with `wpull`.
*What happened:* `wpull` ignores the provided cookies.txt file and crawls without it.
…
-
I'd be interested in getting warc working on Python3. Spent some time fixing up the imports with six, but lost momentum with gzip2.py because Python3's gzip has moved things around.
Is anyone else i…
-
```
SRS 32 — The command line tool shall notify the user of any WARC-record's
anomalies, missing required fields or incompatible fields types.
```
Original issue reported on code.google.com by `gordo…
-
i want to extract files from warcs.
when i use `jwattools extract`, i get a bunch of filenames like `extracted.001` or something like that. am i doing something wrong?