UTDNebula / api-tools

CLI-based tool which facilitates the scraping, parsing, and uploading of data for Nebula Labs' API.
MIT License
5 stars 13 forks source link

Refresh chromedp context on long scraper delay #33

Open jpahm opened 1 month ago

jpahm commented 1 month ago

Currently, the coursebook scraper uses calls to utils.RetryHTTP (defined in utils/methods.go) in order to handle the automatic retry process for requests. In these calls, it provides a callback which, after a certain number of retries, enters a "long delay" state where it simply waits for a long period of time (i.e. 5 minutes) before attempting to query Coursebook again.

This is not sufficient, however, as it appears oftentimes the scraper does not recover from this "long delay" state and is instead only able to make progress once restarted. This would imply a need to not only wait for a longer period of time, but also to create a new chromedp context in order to fix the issue.

As such, the following actions should be taken:

greeshiee commented 1 month ago

I'd love to attempt tinkering with this if that's okay!

jpahm commented 1 month ago

I'd love to attempt tinkering with this if that's okay!

Absolutely!