OSU-Sustainability-Office / automated-jobs

This repository contains various batch-like containerized tasks for OSU SO's data collection operations.
1 stars 0 forks source link

Pacific Power website instability causing scraper to end early #68

Open s-egge opened 2 months ago

s-egge commented 2 months ago

The PP scraper has been reporting duplicate upload errors over the weekend due to this issue. After reading the logs, the scraper also appears to be stopping after only a few meters, which is unrelated to the duplicate data issue.

While troubleshooting, it seems like some meters on the Pacific Power website are not showing data for the monthly data tab even after switching back and forth/refreshing/logging in and out/etc. These meters have historically had monthly data and probably still do, but Pacific Power is unable to show it right now. The scraper has been able to get the data to show in the past by switching back and forth between yearly/monthly, but it's not working in this case. Based on the logs and testing, this is happening to different meters on different days, and each time is causing the scraper to end early as it thinks it's at the end of the list of meters with monthly data (previously, all meters without monthly data have been at the bottom so the scraper ends when it finds one without monthly data). One of the meters that wouldn't show monthly data on 08/03 is showing it now, for example, which leads me to believe this is likely the Pacific Power site being unreliable.

A possible fix is adding a counter for meters without monthly data, and stopping after so many of them are found in a row, which should be a better indication that we have reached the bottom of the monthly meters. This will allow the scraper to skip over meters that it cannot get to show monthly data for and then upload the missing data over the next day/few days if the meter fixes itself.

s-egge commented 2 months ago

This hasn't happened again since the first time, so I'm moving the priority to low since it seems like a very rare issue and will be difficult to duplicate/test.