issues
search
CrawlScript
/
WebCollector
WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k
stars
1.45k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
refactor: design and implementation smells
#137
bhavya844
opened
7 months ago
0
访问的页面报502异常,但是还需要访问,visit异常设置了ExceptionUtils.fail(e)还是不行,怎么解决
#136
Amnesiabht
opened
1 year ago
1
Create TestNews.java
#134
HiIamHiep
closed
1 year ago
0
Bump jsoup from 1.11.3 to 1.15.3
#133
dependabot[bot]
opened
2 years ago
0
Inefficient code detected in RegexRule.java
#132
yinxiL
opened
2 years ago
1
Bump mysql-connector-java from 5.1.46 to 8.0.28
#131
dependabot[bot]
opened
2 years ago
0
ContentExtractor.getContentByUrl返回的内容没有空行等格式排版
#130
AmberYang678
opened
2 years ago
1
Bump gson from 2.8.5 to 2.8.9
#129
dependabot[bot]
opened
2 years ago
0
Bump jsoup from 1.11.3 to 1.14.2
#128
dependabot[bot]
closed
2 years ago
1
自动识别新闻时间部分存在BUG
#127
KTsama
closed
8 months ago
1
大哥些 官方群都加不了了啊。全都提示满了
#126
jiangqiang1996
closed
3 years ago
1
请问论文中的准确度是如何计算的?
#125
fubicheng208
opened
3 years ago
0
访问连接307怎么处理啊
#124
nikesb23
opened
4 years ago
0
Bump junit from 4.12 to 4.13.1
#123
dependabot[bot]
opened
4 years ago
0
Bump mysql-connector-java from 5.1.46 to 8.0.16
#122
dependabot[bot]
closed
2 years ago
1
删除日志
#121
wangqifan
opened
4 years ago
1
out of memory 问题。
#120
wangqifan
opened
4 years ago
0
抽取时间的正则在时那点应该改成【0-9】?
#118
bigzhouj
opened
4 years ago
0
运行爬取CSDN示例代码时,出现RocksDBException,Failed to create a directory: C:\code\weibocrawler\crawl\crawldb: ϵͳÕҲ»µ½ָ¶
#117
jack13163
opened
4 years ago
3
ContentExtractor中的computeInfo函数会出现StackOverflowError
#116
yanpeng
opened
4 years ago
3
请问执行教程中的爬取CSDN博客原码出错
#115
dyn1721
opened
5 years ago
1
亲问下分布式的版本在哪里
#114
xiaowenhuman
opened
5 years ago
0
2.73-alpha版如何忽略https证书过期问题?
#113
hj287678654
opened
5 years ago
2
Bump c3p0 from 0.9.5.2 to 0.9.5.4
#112
dependabot[bot]
opened
5 years ago
0
请问如何在爬虫内部解决数据库连接过多的问题
#111
linye271709915
opened
5 years ago
0
add unit tests for ContentExtractor
#110
tuantran37
opened
5 years ago
0
抛异常的日志级别能不能改warn或error
#109
xiejx618
opened
5 years ago
0
继承BreadthCrawler,获取网页中文部分输出乱码
#108
linye271709915
opened
5 years ago
2
Add demo for selenium crawler with cookie
#107
smallyunet
opened
5 years ago
3
前端渲染的页面怎么样使用webcollector进行爬取数据
#106
qiuqiu0802
opened
5 years ago
0
发布包里包含log4j配置文件,会覆盖别人的log4j配置文件
#104
gaoxjin
closed
5 years ago
3
爬取一段时间后总是会抛出RocksDBException异常,不清楚什么原因。
#103
tanwubo
opened
5 years ago
2
WebCollector交流群
#102
mdzz9527
opened
6 years ago
8
Update README.md
#101
mdzz9527
opened
6 years ago
0
Update DemoCookieCrawler.java
#100
mdzz9527
closed
6 years ago
0
Update README.md
#99
mdzz9527
opened
6 years ago
0
Update DemoCookieCrawler.java
#98
mdzz9527
closed
6 years ago
0
WebCollector-Hadoop版本的源码请问有公开么?
#97
coderf187
closed
6 years ago
1
有没有相关的交流群啊?
#96
liushaofeng89
opened
6 years ago
2
好像OkHttp ConnectionPool和Okio Watchdog没有正确关闭
#95
lewiswu1209
opened
6 years ago
4
能否将深度设置为只要有链接就会进行下一次爬取
#94
hxq201300
closed
6 years ago
1
关于新版本设置UA不生效的问题
#93
CNdarkmoon
opened
6 years ago
1
你好! LockTimeoutException
#92
simplecnst
closed
6 years ago
1
如何判断爬虫结束
#91
djxhero
closed
6 years ago
1
重定向
#90
YYSpace
closed
6 years ago
4
你好,RamCrawler大约加了70个种子,执行结果不稳定
#89
gaoda1234
closed
6 years ago
3
StrategyCrawler类的stop方法能否立即停止爬虫行为
#88
BeQiang
closed
6 years ago
1
如何使用这个框架爬取手机app的数据呢?
#87
mdzz9527
closed
6 years ago
1
官网配置教程中的NewsCrawler.java报错
#86
MrKingHH
closed
6 years ago
1
注入URL,只执行一部分
#85
mdzz9527
closed
6 years ago
4
Next