CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

ContentExtractor时间解析不准确 #46

Open sundy-li opened 8 years ago

sundy-li commented 8 years ago

尝试了几个资讯网站:

http://www.leiphone.com/news/201609/AtW1F5zt6GS1ru9Y.html https://www.huxiu.com/article/167883.html

时间解析准确率偏低,准确率很低, 希望优化时间正则