CrawlScript / WebCollector

WebCollector is an open source web crawler framework based on Java.It provides some simple interfaces for crawling the Web,you can setup a multi-threaded web crawler in less than 5 minutes.
https://github.com/CrawlScript/WebCollector
GNU General Public License v3.0
3.07k stars 1.45k forks source link

如何在visited方法中把拼接的url放入到url队列中? #60

Closed wuxiongliu1 closed 7 years ago

wuxiongliu1 commented 7 years ago

在visited中,如何将拼接的url放入到待处理队列中呢? 调用crawlDatums.add() 方法可以将url加入到队列中,但是发现在处理这个页面的时候,不会自动收集该页面中的所有链接了; autoParse已经设置为了true了;

hujunxianligong commented 7 years ago

你可以看一下那些链接是不是真的在HTML中,还是由JS加载的