CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

Update ContentExtractor time parser #47

Open sundy-li opened 8 years ago

sundy-li commented 8 years ago

Fix: #46 Update ContentExtractor : prebuild and update the time regexp pattern to make ContentExtractor faster and more accurate

Update .gitignore : exclude unused folders and files in git