HtmlParseData should hold a unique list of URLs

asepaprianto / crawler4j

Automatically exported from code.google.com/p/crawler4j

0 stars 0 forks source link

HtmlParseData should hold a unique list of URLs #291

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Currently HtmlParseData holds a non-unique list of links in the page, meaning 
that if a URL appears in the page several times then it will appear in the list 
of links several times.

I can't think of a scenario where somebody will parse an html page and want the 
same link more than once.

We should hold a set instead of a list, thus having a unique list of links.

Original issue reported on code.google.com by avrah...@gmail.com on 21 Aug 2014 at 11:07

GoogleCodeExporter commented 9 years ago

All examples should be fixed accordingly

Original comment by avrah...@gmail.com on 21 Aug 2014 at 11:08

GoogleCodeExporter commented 9 years ago

Fixed in Revision: f5ec5157fcf4

Original comment by avrah...@gmail.com on 21 Aug 2014 at 11:21

Changed state: Fixed