datamade / django-councilmatic

:heartpulse: Django app providing core functions for *.councilmatic.org
http://councilmatic.org
MIT License
26 stars 16 forks source link

ways to speed up loaddata, take advantage of last_updated_at #14

Closed cathydeng closed 8 years ago

cathydeng commented 8 years ago

would be nice to grab the most recent ocd_updated_at timestamp that we have stored, and use that as cutoff for checking new data on ocd, to minimize how much stuff we have to look at.

to do this, we'd need to ensure that the most recent ocd_updated_at timestamp that we have stored is reliable, i.e. that we actually have looked at everything updated before that time (for example, timestamp may not be reliable to use as a cutoff if a previous loaddata task started but didn't finish)

options for implementation:

  1. we sort stuff on ocd api by last_updated_at and then loaddata looks at stuff from oldest to newest
  2. we have another table recording all loaddata tasks & whether they finished running
fgregg commented 8 years ago

SInce there are still bills from the 2011 legislative session that are working their way through the council, we cant just update bills from the current legislative session.

This means, right now, we have do do python manage.py loaddata --fullhistory which downloads 65K+ pieces of legislation.

It's time to revisit our update strategy and leverage our ability to sort the data by last_updated

http://ocd.datamade.us/bills/?from_organization=ocd-organization/ef168607-9135-4177-ad8e-c1f7a4806c3a&sort=-updated_at

fgregg commented 8 years ago

You'll may want to do something like:

http://ocd.datamade.us/bills/?from_organization=ocd-organization/ef168607-9135-4177-ad8e-c1f7a4806c3a&sort=updated_at&updated_at__gte=2016-01-01

Where you get all the stuff that have been updated since the last successful import into local, and where we start with the oldest to newest.

fgregg commented 8 years ago

We are doing this now for bills.