ScottMansfield / widow

Distributed, asynchronous web crawler
GNU Lesser General Public License v2.1
26 stars 4 forks source link

Filter anchor links out of the OUT_LINKS field #8

Closed ScottMansfield closed 9 years ago

ScottMansfield commented 9 years ago

Normalized versions of the URL will be "empty" so they should be excluded from the start.

e.g. in

foo

The link "#foo" should not be included in the list for outgoing links. Evaluate adding a separate set for the internal anchor links.