Refresh chromedp context on long scraper delay

jpahm commented 1 month ago

Currently, the coursebook scraper uses calls to utils.RetryHTTP (defined in utils/methods.go) in order to handle the automatic retry process for requests. In these calls, it provides a callback which, after a certain number of retries, enters a "long delay" state where it simply waits for a long period of time (i.e. 5 minutes) before attempting to query Coursebook again.

This is not sufficient, however, as it appears oftentimes the scraper does not recover from this "long delay" state and is instead only able to make progress once restarted. This would imply a need to not only wait for a longer period of time, but also to create a new chromedp context in order to fix the issue.

As such, the following actions should be taken:

[ ] Develop a utility function which can establish a new chromedp context that picks up from where an old one left off
[ ] Add calls to this utility function when the scraper enters a long delay state

greeshiee commented 1 month ago

I'd love to attempt tinkering with this if that's okay!

jpahm commented 1 month ago

I'd love to attempt tinkering with this if that's okay!

Absolutely!

UTDNebula / api-tools

Refresh chromedp context on long scraper delay #33