Suggestions for separate follow-links and fetch-links with possibility of exclusion

fergiemcdowall / norch-fetch

Fetch pure HTML from a webserver and save it to disk

MIT License

8 stars 2 forks source link

Suggestions for separate follow-links and fetch-links with possibility of exclusion #6

Open eklem opened 10 years ago

eklem commented 10 years ago

forage-fetch-idea The idea is to: A: Define the boundaries of the crawl (site, sites, subsite, a set of subsites) B: Define a html-file for start-URL's C: Set the pattern of the URL's to follow D: Make one or more exclude-patterns for C (i.e. ensuring to not click what's ultimately the same page several times) E: Set the pattern of the URL's to fetch. These can be overlapping with B, but doesn't have to. F: Make one or more exclude-patterns for E

fergiemcdowall commented 10 years ago

Prettiest issue ever! And a very sensible idea. Maybe This could be fixed on friday?

eklem commented 10 years ago

You're more than welcome to use time on Friday's hackathon to fix this =)