Closed hancush closed 4 years ago
On Friday, we run full event and bill scrapes at the top of every hour. That means most of the regular full scrape is redundant. I propose we nix the regular full scrape on Friday and run only a person scrape, instead. This should remove the blocker!
The full bill scrape took almost seven hours last night!!!
lametro (scrape)
bills: {'window': '0'}
bills scrape:
duration: 6:45:34.103592
objects:
bill: 3083
vote_event: 1489
jurisdiction scrape:
duration: 0:00:00.158219
objects:
jurisdiction: 1
organization: 3
post: 18
04/11/2020 03:16:21 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Planning and Development (Department)"}
04/11/2020 03:16:26 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Operations (Department)"}
04/11/2020 03:16:28 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Program Management (Department)"}
04/11/2020 03:16:28 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Maria Luk"}
04/11/2020 03:16:36 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "-"}
04/11/2020 03:16:41 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Fe Dalida"}
04/11/2020 03:17:43 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Chris Reyes"}
04/11/2020 03:18:59 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "James Butts"}
04/11/2020 03:18:59 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Jacquelyn Dupont-Walker"}
04/11/2020 03:18:59 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Ara Najarian"}
04/11/2020 03:19:19 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "Martha Welborne"}
04/11/2020 03:19:40 ERROR pupa: cannot resolve pseudo id to Organization: ~{"name": "OCEO (Department)"}
lametro (import)
people: {}
events: {}
bills: {}
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
lametro (import)
people: {}
events: {}
bills: {}
import:
bill: 0 new 0 updated 3083 noop
jurisdiction: 0 new 0 updated 1 noop
organization: 0 new 0 updated 3 noop
post: 0 new 0 updated 18 noop
vote_event: 0 new 0 updated 1489 noop
In other words, the slow full scrape blocked other scrapes for almost the entire support window. 😓
Manually ran a full event scrape to post agendas this morning.
lametro (scrape)
events: {}
events scrape:
duration: 0:06:34.156762
objects:
event: 391
jurisdiction scrape:
duration: 0:00:01.411950
objects:
jurisdiction: 1
organization: 3
post: 18
lametro (import)
people: {}
events: {}
bills: {}
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
lametro (import)
people: {}
events: {}
bills: {}
import:
event: 1 new 6 updated 384 noop
jurisdiction: 0 new 0 updated 1 noop
organization: 0 new 0 updated 3 noop
post: 0 new 0 updated 18 noop
We addressed this.
We run full and windowed scrapes on Friday, however we preclude multiple scrapes from running at once, so in theory, a full scrape could block a windowed scrape and prevent recent changes from appearing for quite a while. Let's look into a Friday-night approach that balances efficiency with completeness.