ScottMansfield / widow

Distributed, asynchronous web crawler
GNU Lesser General Public License v2.1
26 stars 4 forks source link

Have a better story around local caching independent of the crawling stages #6

Open ScottMansfield opened 9 years ago

ScottMansfield commented 9 years ago

The fetch, parse, and (maybe) index pages should heavily use caching to prevent duplication of work. The cache right now is either a frail connection to a single EC2 instance, or an in-memory in-process cache that does not survive restarts. A local option should be easy to do.

Probably a gradle task with dependsOn should do.

ScottMansfield commented 9 years ago

Running a local redis server seems to work, but it's still not ideal since it requires redis to be installed.