ScottMansfield / widow

Distributed, asynchronous web crawler
GNU Lesser General Public License v2.1
26 stars 4 forks source link

Add links by content type to the main page data #9

Open ScottMansfield opened 9 years ago

ScottMansfield commented 9 years ago

First, the links by content type that are collected should be consolidated by similar but not exact matches.

E.g.

text/html; charset=utf-8 text/html; charset=UTF-8