Throttle requests to a domain by total bandwidth in a specified period of time

from @truthpickle via livecoding.tv:

The crawler should be able to keep track of the total amount of bandwidth used per domain and limit to a specified amount in a specified period of time, e.g. 1 GB / month or 400MB / week. The fetch stage can just not retrieve the pages once the limit is passed. When parsing, a little softness can be acceptable, but if the limit is passed too far the page should be dropped from the pipeline.

ScottMansfield / widow

Throttle requests to a domain by total bandwidth in a specified period of time #19