issues
search
LAW-Unimi
/
BUbiNG
The LAW next generation crawler.
http://law.di.unimi.it/software.php#bubing
Apache License 2.0
85
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
URLMatchesRegex seems not to be working
#28
beviah
closed
2 years ago
30
Gracefully recover crawl when unexpectedly stopped
#27
kasparas12
closed
3 years ago
1
Unable to run compiled jar
#26
kasparas12
closed
3 years ago
0
maxUrls config not honored
#25
FiV0
opened
3 years ago
6
ivy.xml outdated
#24
dennis-kao
closed
3 years ago
2
Support wget URI format
#23
pierlauro
closed
4 years ago
1
Update ant/ivy versions & deps
#22
pierlauro
closed
4 years ago
1
Any simple tutorial on how to start Bubing
#21
GaneshVarahade
closed
5 years ago
1
Implementing robots Google parser
#20
vprelovac
closed
2 years ago
0
301 redirects on too many otherwise accessible pages (via wget from same server or browser)
#19
beviah
closed
5 years ago
12
ignores nofollow on button - adds items to cart
#18
mpalasis
opened
6 years ago
0
robots.txt parsed as ISO-8859-1 - break when there's a UTF-8 BOM
#17
guillaumepitel
closed
6 years ago
1
WorkbenchEntry-based scheduling
#16
guillaumepitel
closed
6 years ago
8
BUbiNG should parse content streams of length 0
#15
vigna
closed
6 years ago
1
NoSuchMethodException with default configuration (IsProbablyBinary.valueOf())
#14
ChuckNoxis
closed
6 years ago
3
Added ICU4J Charset detection if the Header/Meta detection fails
#13
ChuckNoxis
opened
6 years ago
1
ParsingThread blocked by jgroups
#12
guillaumepitel
closed
6 years ago
3
HTML5 charset declaration not detected
#11
guillaumepitel
opened
6 years ago
2
Distribution not working as expected
#10
guillaumepitel
closed
7 years ago
8
Multiwarc
#9
guillaumepitel
opened
7 years ago
3
Hosts with same IP address are not processed by the same node, so IP delay cannot be enforced
#8
guillaumepitel
closed
6 years ago
8
Update from previous bugfix for HTTPS queries
#7
guillaumepitel
closed
7 years ago
0
https urls are actually fetch using http
#6
guillaumepitel
closed
7 years ago
3
SSL Certificate are wrongly rejected
#5
guillaumepitel
closed
7 years ago
14
Duplicates or 403 are not taken into account by the maxUrlPerSchemeAuthority limit
#4
guillaumepitel
opened
7 years ago
2
FetchingThreads seem to hang and do nothing
#3
guillaumepitel
closed
7 years ago
5
MultiWarc
#2
guillaumepitel
closed
7 years ago
1
Add Accept headers to request, in order to avoid blocking because of mod_security default
#1
boldip
closed
7 years ago
1