CrawlScript WebCollector issues

CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.

https://github.com/CrawlScript/WebCollector

GNU General Public License v3.0

3.07k stars 1.45k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

refactor: design and implementation smells

#137 bhavya844 opened 7 months ago
0
访问的页面报502异常，但是还需要访问，visit异常设置了ExceptionUtils.fail(e)还是不行，怎么解决

#136 Amnesiabht opened 1 year ago
1
Create TestNews.java

#134 HiIamHiep closed 1 year ago
0
Bump jsoup from 1.11.3 to 1.15.3

#133 dependabot[bot] opened 2 years ago
0
Inefficient code detected in RegexRule.java

#132 yinxiL opened 2 years ago
1
Bump mysql-connector-java from 5.1.46 to 8.0.28

#131 dependabot[bot] opened 2 years ago
0
ContentExtractor.getContentByUrl返回的内容没有空行等格式排版

#130 AmberYang678 opened 2 years ago
1
Bump gson from 2.8.5 to 2.8.9

#129 dependabot[bot] opened 2 years ago
0
Bump jsoup from 1.11.3 to 1.14.2

#128 dependabot[bot] closed 2 years ago
1
自动识别新闻时间部分存在BUG

#127 KTsama closed 8 months ago
1
大哥些官方群都加不了了啊。全都提示满了

#126 jiangqiang1996 closed 3 years ago
1
请问论文中的准确度是如何计算的？

#125 fubicheng208 opened 3 years ago
0
访问连接307怎么处理啊

#124 nikesb23 opened 4 years ago
0
Bump junit from 4.12 to 4.13.1

#123 dependabot[bot] opened 4 years ago
0
Bump mysql-connector-java from 5.1.46 to 8.0.16

#122 dependabot[bot] closed 2 years ago
1
删除日志

#121 wangqifan opened 4 years ago
1
out of memory 问题。

#120 wangqifan opened 4 years ago
0
抽取时间的正则在时那点应该改成【0-9】？

#118 bigzhouj opened 4 years ago
0
运行爬取CSDN示例代码时，出现RocksDBException，Failed to create a directory: C:\code\weibocrawler\crawl\crawldb: ϵͳÕҲ»µ½ָ¶

#117 jack13163 opened 4 years ago
3
ContentExtractor中的computeInfo函数会出现StackOverflowError

#116 yanpeng opened 4 years ago
3
请问执行教程中的爬取CSDN博客原码出错

#115 dyn1721 opened 5 years ago
1
亲问下分布式的版本在哪里

#114 xiaowenhuman opened 5 years ago
0
2.73-alpha版如何忽略https证书过期问题？

#113 hj287678654 opened 5 years ago
2
Bump c3p0 from 0.9.5.2 to 0.9.5.4

#112 dependabot[bot] opened 5 years ago
0
请问如何在爬虫内部解决数据库连接过多的问题

#111 linye271709915 opened 5 years ago
0
add unit tests for ContentExtractor

#110 tuantran37 opened 5 years ago
0
抛异常的日志级别能不能改warn或error

#109 xiejx618 opened 5 years ago
0
继承BreadthCrawler，获取网页中文部分输出乱码

#108 linye271709915 opened 5 years ago
2
Add demo for selenium crawler with cookie

#107 smallyunet opened 5 years ago
3
前端渲染的页面怎么样使用webcollector进行爬取数据

#106 qiuqiu0802 opened 5 years ago
0
发布包里包含log4j配置文件，会覆盖别人的log4j配置文件

#104 gaoxjin closed 5 years ago
3
爬取一段时间后总是会抛出RocksDBException异常，不清楚什么原因。

#103 tanwubo opened 5 years ago
2
WebCollector交流群

#102 mdzz9527 opened 6 years ago
8
Update README.md

#101 mdzz9527 opened 6 years ago
0
Update DemoCookieCrawler.java

#100 mdzz9527 closed 6 years ago
0
Update README.md

#99 mdzz9527 opened 6 years ago
0
Update DemoCookieCrawler.java

#98 mdzz9527 closed 6 years ago
0
WebCollector-Hadoop版本的源码请问有公开么？

#97 coderf187 closed 6 years ago
1
有没有相关的交流群啊？

#96 liushaofeng89 opened 6 years ago
2
好像OkHttp ConnectionPool和Okio Watchdog没有正确关闭

#95 lewiswu1209 opened 6 years ago
4
能否将深度设置为只要有链接就会进行下一次爬取

#94 hxq201300 closed 6 years ago
1
关于新版本设置UA不生效的问题

#93 CNdarkmoon opened 6 years ago
1
你好！ LockTimeoutException

#92 simplecnst closed 6 years ago
1
如何判断爬虫结束

#91 djxhero closed 6 years ago
1
重定向

#90 YYSpace closed 6 years ago
4
你好，RamCrawler大约加了70个种子，执行结果不稳定

#89 gaoda1234 closed 6 years ago
3
StrategyCrawler类的stop方法能否立即停止爬虫行为

#88 BeQiang closed 6 years ago
1
如何使用这个框架爬取手机app的数据呢？

#87 mdzz9527 closed 6 years ago
1
官网配置教程中的NewsCrawler.java报错

#86 MrKingHH closed 6 years ago
1
注入URL，只执行一部分

#85 mdzz9527 closed 6 years ago
4