issues
search
ArchiveTeam
/
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.31k
stars
129
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Document `--wpull-args=--no-warc-compression`
#190
TheTechRobo
closed
2 years ago
1
Change settings mid-crawl
#189
TheTechRobo
opened
3 years ago
0
Grab-site gets only a single page
#188
mathuryash5
opened
3 years ago
4
Cookies not staying
#187
TheTechRobo
opened
3 years ago
5
clearer error when URL is invalid
#186
TheTechRobo
opened
3 years ago
0
My computer crashed. I'm 10gb into a crawl. How can I "resume" this crawl?
#185
komali2
closed
3 years ago
2
Ignore local/lan-only hosts (and invalid domains).
#184
jtagcat
opened
3 years ago
0
--no-offsite-links doesn't work
#183
tripleo1
closed
3 years ago
4
Dockerfile?
#182
818S
closed
3 years ago
4
Can't evaluate Select
#181
TheTechRobo
closed
3 years ago
4
Update setup.py
#180
PythonCoderAS
closed
2 years ago
0
Consider an option to generate WACZ files after a crawl is done for better replay with ReplayWeb.page
#179
ikreymer
opened
3 years ago
1
Ignore set: XenForo 1/2 and PostNuke forum engines
#178
nekto-nekto
opened
3 years ago
1
del
#177
nekto-nekto
closed
3 years ago
0
Issue-175: First pass at creating a Dockerfile for Nix that actually runs
#176
bknowles
closed
3 years ago
1
Add a Dockerfile for running grab-site in a Nix-based container
#175
bknowles
closed
3 years ago
1
Can't build lxml.etree (on macOS)
#174
bknowles
closed
3 years ago
9
[wpull] 'cython_function_or_method' object has no attribute 'lower'
#173
tempname1024
opened
3 years ago
0
[BUG] Twitter pages potentially not downloading correctly
#172
Coloradohusky
closed
3 years ago
2
Bash script for automatic upload
#171
raspher
closed
3 years ago
1
pull args for http-auth (e.g. --user --password) are ignored
#170
mep85
opened
3 years ago
0
Regexp exclusion problem
#169
manueldeprada
closed
3 years ago
3
Change wpull args during a crawl
#168
Coloradohusky
opened
4 years ago
2
ImportError: cannot import name 'SSLCertificateError'
#167
dragonxtek
closed
4 years ago
1
Make WARC files searchable
#166
Svekla
opened
4 years ago
1
Any solutions for already mentioned errors: Event loop is closed / Task is destroyed?
#165
weselow
opened
4 years ago
2
Pip build missing required package?
#164
cfcs
closed
2 years ago
8
cannot import name 'SSLCertificateError'
#163
mkrzmr
closed
4 years ago
6
More intelligent protocol selection
#162
masterX244
closed
4 years ago
2
--finished-warc-dir= not working for me
#161
BradCoffield
closed
4 years ago
2
What does the error status in URL queue mean?
#160
Phasip
opened
4 years ago
0
Possible to run in the cloud?
#159
BradCoffield
closed
4 years ago
8
WSL: lmdb.CorruptedError: mdb_get: MDB_CORRUPTED: Located page was wrong type
#158
menmob
closed
4 years ago
2
macOS-specific lxml crash: LookupError: unknown encoding: 'b'latin1''
#157
ivan
opened
4 years ago
2
DNS operation timed out
#156
nihelmasell
opened
4 years ago
1
Best way to grab this page?
#155
sardaukar
opened
4 years ago
6
Errors on initial URLs are retried forever
#154
JustAnotherArchivist
closed
5 years ago
0
Continuing or updating a grab
#153
nihelmasell
closed
4 years ago
5
Crawl eventually becomes nothing but "Disconnected from ws:// server:"...
#152
BradCoffield
closed
5 years ago
2
Crash on EOFError: Compressed file ended before the end-of-stream marker was reached
#151
ivan
opened
5 years ago
0
Homebrew install on macOS 10.14.4 (command 'clang' failed with exit status 1)
#150
markhdavis
closed
5 years ago
1
Add simplistic Dockerfile
#149
Fusl
opened
5 years ago
4
wpull crash when http_proxy is set
#148
yi
opened
5 years ago
2
Reference git repo in install_requires
#147
Fusl
closed
5 years ago
3
Seeking new maintainer / project owner
#146
ivan
opened
5 years ago
6
Recent version of pip removed --process-dependency-links
#145
ivan
closed
5 years ago
0
ftp:// crawls crash with AttributeError: 'ListingResponse' object has no attribute 'version'
#144
ivan
opened
5 years ago
0
--igsets=singletumblr misbehaves when start URL lacks trailing slash
#143
ivan
opened
5 years ago
0
Any way to resume a (input list) crawl?
#142
ghost
closed
5 years ago
3
Are there any plans on getting grab-site into the official Debian/Ubuntu software repositories?
#141
github-userx
closed
4 years ago
1
Previous
Next