Closed zaibacu closed 3 months ago
hi @zaibacu One option would be to externally track the number of URLs already fetched for a queue and block it with BlockQueueUntil. it currently can't be done within URLFrontier. If you were to do that, you could use the getCountCompleted info within a queue.
It could be an interesting feature, happy to discuss it further if you are open to contributing it
Thank you! I think I'll start with BlockQueueUntil
since I need quick solution, and later come back with feature for UrlFrontier itself
Fixed in #91
We have business use case where we need want to crawl only up to certain limit of Urls per domain. After domain becomes refetchable, just reset the counter and fetch again.
As far as I can see, there's no option to do that currently? I'm willing to contribute custom code for this, just want to make sure this is OK with overall design of the library, or maybe there's already option to config that?