Open shrayshray opened 5 years ago
Here is a document explaining the current scraping/syncing schedule: https://docs.google.com/document/d/1h_PSQiO9qK-UaRxIJa5qObX6YQq18ErzRk1N2lMbciM/edit?usp=sharing
Awesome, thank you!!! Will review with Omar to see if we need any further clarification, but on initial review, this looks very thorough - thank you!
Per @hancush the google doc linked above is out of date. Until it's updated, this is the info Hannah provided about the schedule so we have it for reference until the doc is updated.
Nightly, Saturday through Thursday o 8:05p PDT / 7:05p PST: Regular speed full person, event, and bill scrapes
Saturday through midday Friday o Every 15 minutes: Windowed bill and event scrapes of changes in past 72 minutes*
Support window PDT o 12a to 1:50p: Regular windowed scrapes o 2p to 10:50p: Fast full scrapes at the top of every hour; windowed bill and event scrapes of changes in the past day* twice an hour o 11p onward: Regular windowed scrapes
PST o 12a to 12:50p: Regular windowed scrapes o 1p to 9:50p: Fast full scrapes at the top of every hour; windowed bill and event scrapes of changes in the past day* twice an hour o 10p onward: Regular windowed scrapes
*Note that many changes to bills and events do not update the last updated flag, including toggling an agenda from private to public. This is why we run full scrapes so aggressively when we know changes like this are likely.
You can see that the nightly scrape runs in the middle-ish of the support window. Agendas were posted at 6 p.m. Friday, during the full scrape run. The full scrape prevents other scrapes from running, so the fast full scrapes that should have captured the updated events did not run. The full scrape also took nearly 7 hours to complete, so by the time it was done, the support window had ended and we had reverted to windowed scrapes every 15 minutes.
Because we run fast full scrapes at the top of every hour during the support window, the regular speed full scrape is redundant. So, I'd like to turn off the regular speed full scrape on Fridays, to prevent this from happening again.
Thank you for updating this issue, @shrayshray! Connects https://github.com/datamade/scrapers-us-municipal/issues/38.
Hi Team,
Can this be reviewed and updated based on the changes and the recent migration to Heroku?
Thanks!
Part of this pull in the scrapers repo.
Omar and I are putting together a reference for staff about what to expect regarding the syncing/scraping schedule. So if, for example, someone updates a published report in Legistar on a Tuesday afternoon, and wants to know when the change will be reflected on the site, we can give them an accurate time frame (and know when to contact Datamade if the expected updates are not reflected).
Below is latest information Omar and I understand to be correct, plus a few questions. Could you please review and validate and/or correct our assumptions?
Sync/Scraping Schedule Full Scrapes – All events bills are scraped. Windowed Scrapes – Partial scrapes: updates to bills previously scraped.
A. Meetings/Events
B. Reports/Bills
C. Live Event Video Links
Questions: