CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

正文提取问题 #75

Closed Kaneki-x closed 6 years ago

Kaneki-x commented 6 years ago

提取出的网页正文可以保持和原来一样的换行分段嘛

hujunxianligong commented 6 years ago

提取网页的API提供了两种模式,一种是提取文字,一种是提取Element,Element里保留了格式。