issues
search
ArchiveTeam
/
grab-site
The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
Other
1.35k
stars
134
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add config file using configparser
#91
12As
closed
8 years ago
1
Allow starting crawls directly from the dashboard
#90
brandongalbraith
closed
4 years ago
2
Problem using -i option again
#89
dkl3
closed
8 years ago
4
Add option to automatically crawl up to any potential directory listing
#88
dkl3
opened
8 years ago
8
FTP grabs non-functional WARCs
#87
rwoodpecker
closed
8 years ago
3
Allow multiple locations and protocols in gs-server environmental variables
#86
12As
closed
8 years ago
4
Enhancement idea: Automatic concurrency / delay management
#85
ethus3h
closed
8 years ago
4
Enhancement idea: show failed jobs on the dashboard until gs-server is restarted
#84
ethus3h
opened
8 years ago
4
Allow serving the dashboard with https://
#83
rwoodpecker
opened
8 years ago
10
Enhancement idea: easier multi-URL crawl starting and handling of aggressive extraction
#82
ethus3h
closed
8 years ago
10
Enhancement idea: URL prioritization
#81
ethus3h
opened
8 years ago
6
Multiple --wpull-args options don't seem to be respected
#80
ethus3h
opened
8 years ago
5
PSA: grab-site 0.11 upgrade notes: restart gs-server
#79
ivan
closed
8 years ago
0
Allow changing the websocket endpoint via a control file
#78
ivan
opened
8 years ago
0
Allow crawling localhost and LAN IPs if crawl starts on such a domain
#77
ivan
closed
7 years ago
3
Make gs-server require a single port only
#76
12As
closed
8 years ago
12
Support custom Python3 script
#75
Arkiver2
closed
8 years ago
10
NoneType object has no attribute 'end_control'
#74
ethus3h
opened
8 years ago
0
Log gets spammed with UTF-8 encode errors regarding unpaired surrogates
#73
ethus3h
opened
8 years ago
2
Enhancement idea: conditional ignores
#72
ethus3h
opened
8 years ago
1
Crawl with phantomjs doesn't exit after archiving a twitter hashtag page
#71
dkl3
closed
8 years ago
4
grab-site sometimes runs out of memory when downloading large files
#70
ivan
opened
8 years ago
8
Simplify development setups
#69
DanielOaks
closed
8 years ago
2
[SSL: UNSUPPORTED_PROTOCOL] with phantomjs
#68
rwoodpecker
closed
8 years ago
5
server: Fix exception when clients visit ws port via a browser
#67
DanielOaks
closed
8 years ago
1
Crawler gets permanent IP bans from typepad
#66
ivan
opened
8 years ago
0
Document OS X locale requirement in the install steps
#65
ivan
closed
8 years ago
1
OS X 10.11: default US-ASCII codec breaks click
#64
rwoodpecker
closed
8 years ago
14
Document how to grab a website that requires login
#63
ivan
closed
6 years ago
4
Can't start crawl on some URLs: "URL is not printable"
#62
ivan
closed
8 years ago
3
PSA: Report bugs here, I am not looking at the IRC channels for a little while
#61
ivan
closed
8 years ago
0
Crawls sometimes hang forever
#60
ivan
opened
8 years ago
4
Enhancement idea: delay/concurrency by regex
#59
ethus3h
opened
8 years ago
3
Allow resuming a crawl
#58
ivan
opened
8 years ago
9
Allow specifying a job ident
#57
12As
closed
8 years ago
1
Allow specifying working directory
#56
12As
closed
8 years ago
1
Problems Using phantom-js
#55
dkl3
closed
8 years ago
6
Enhancement idea: delta mode
#54
ethus3h
opened
8 years ago
0
gs-server raises KeyError with Firefox 42 client on OS X
#53
ethus3h
closed
8 years ago
8
Many crawls segfault when machine has high CPU load
#52
ethus3h
closed
6 years ago
46
Crashes with AssertionError: assert url_item.is_processed
#51
ethus3h
opened
8 years ago
2
Non-WSL Windows: remove the need for `set GRAB_SITE_NO_CCHARDET=1`
#50
ivan
opened
8 years ago
1
Non-WSL Windows: add install steps to README
#49
ivan
opened
8 years ago
0
Windows: add support for CRLF line endings in control files
#48
ivan
opened
8 years ago
0
Non-WSL Windows: use entry_points so that pip3 creates .exe's in Scripts\
#47
ivan
opened
8 years ago
0
Add windows exe build.
#46
luckcolors
closed
8 years ago
8
Crashes on Python 3.5
#45
ivan
closed
8 years ago
2
On 32-bit ARMv7: lmdb.MemoryError: [...]/dupes_db: Cannot allocate memory
#44
ivan
closed
9 years ago
1
Dupespotter has false positives
#43
ivan
opened
9 years ago
1
Write a DIR/skipped_max_content_length file
#42
ivan
closed
9 years ago
1
Previous
Next