There will, on occasion, come the time when credentials for various services get tagged as scraping and shut down temporarily or permanently.
The #1 concern is stopping scrapping for that service immediately so we don't keep throwing more and more errors and potentially turning a temporary turn off to a very permanent one. How do we do this? Probably by creating a new Setting (which we already use for API keys so very easy to setup) that just keeps track of if each service is enabled before scraping. This adds another db lookup for every job, but honestly? Not a big deal
The #2 concern is how do we keep the scraping going. Right now we have single credentials stored. We may consider adding a stack of credentials where if one goes down the second is automatically tried instead. If that works, it's then set as the default, and the old ones are tagged for checking.
This would require an actual UI probably, or we could maybe do it in a rake task/CLI implementation
This UI would have to include the following:
Status of each source scraper ("paused" "running" "out of credentials")
Ability add new credentials without restarting the server
Ability to view current and problematic credentials
Ability to reinsert credentials into the stack once they've been cleared
Ability to stop all scraping (a "Scramble button")
Ability to pause and start individual scrapers
In addition we'll have to do the following:
Add the ability to pause certain scrapes.
How? We could have the jobs requeue themselves, but then that doesn't really stop anything
We could have them shuffle themselves off into a database of stopped jobs and then get them readied when scraping starts back up again
We could have a scanner that goes through each stored job and picks it out and shuffles it off to storage
We have to have the ability to just stop Sidekiq completely, and to start it up again
There will, on occasion, come the time when credentials for various services get tagged as scraping and shut down temporarily or permanently.
The #1 concern is stopping scrapping for that service immediately so we don't keep throwing more and more errors and potentially turning a temporary turn off to a very permanent one. How do we do this? Probably by creating a new
Setting
(which we already use for API keys so very easy to setup) that just keeps track of if each service is enabled before scraping. This adds another db lookup for every job, but honestly? Not a big dealThe #2 concern is how do we keep the scraping going. Right now we have single credentials stored. We may consider adding a
stack
of credentials where if one goes down the second is automatically tried instead. If that works, it's then set as the default, and the old ones are tagged for checking.This would require an actual UI probably, or we could maybe do it in a rake task/CLI implementation
This UI would have to include the following:
In addition we'll have to do the following: