The next_url api is an expensive join query that needs to be reworked into a queuing system that is generating the url list periodically. We could probably run this every 5-10 min, as currently each time we run the query we get a full list of urls and then grab a random one that need scanning. If we simply populate the queue and pop one off, it'll be way more efficient.
The next_url api is an expensive join query that needs to be reworked into a queuing system that is generating the url list periodically. We could probably run this every 5-10 min, as currently each time we run the query we get a full list of urls and then grab a random one that need scanning. If we simply populate the queue and pop one off, it'll be way more efficient.