The idea is to:
A: Define the boundaries of the crawl (site, sites, subsite, a set of subsites)
B: Define a html-file for start-URL's
C: Set the pattern of the URL's to follow
D: Make one or more exclude-patterns for C (i.e. ensuring to not click what's ultimately the same page several times)
E: Set the pattern of the URL's to fetch. These can be overlapping with B, but doesn't have to.
F: Make one or more exclude-patterns for E
The idea is to: A: Define the boundaries of the crawl (site, sites, subsite, a set of subsites) B: Define a html-file for start-URL's C: Set the pattern of the URL's to follow D: Make one or more exclude-patterns for C (i.e. ensuring to not click what's ultimately the same page several times) E: Set the pattern of the URL's to fetch. These can be overlapping with B, but doesn't have to. F: Make one or more exclude-patterns for E