datamade / django-councilmatic

:heartpulse: Django app providing core functions for *.councilmatic.org
http://councilmatic.org
MIT License
26 stars 16 forks source link

OCD model migration #240

Closed fgregg closed 5 years ago

fgregg commented 5 years ago

This PR changes django-councilmatic to use opencivicdata models.

In concert with https://github.com/datamade/chi-councilmatic/pull/252, these code changes can largely reproduce the Chicago Councilmatic site.

TODO:

fgregg commented 5 years ago

@hancush this is ready for me to indoctrinate you.

hancush commented 5 years ago

logging to return to later: some memberships don't have end dates in the ocd api, e.g., rahm https://ocd.datamade.us/ocd-person/f649753d-081d-4f22-8dcf-3af71de0e6ca/

in django-councilmatic, we store null values as null, but ocd stores null values as empty strings.

this causes things to break when we perform operations that assume a certain type, e.g.,

https://github.com/datamade/django-councilmatic/blob/daad426bc60b585464eb65535e78b930e6bbe707/councilmatic_core/models.py#L250-L252

this may come to bear on other fields, too.

hancush commented 5 years ago

logging for myself: chicago doesn't have bill documents!

select * from opencivicdata_billdocument as doc join opencivicdata_bill as bill on doc.bill_id = bill.id join opencivicdata_organization as org on bill.from_organization_id = org.id where org.jurisdiction_id = 'ocd-jurisdiction/country:us/state:il/place:chicago/government';
 id | note | date | bill_id | created_at | updated_at | extras | id | identifier | title | classification | subject | legislative_session_id | from_organization_id | locked_fields | created_at | updated_at | extras | id | name | image | classification | founding_date | dissolution_date | jurisdiction_id | parent_id | locked_fields
----+------+------+---------+------------+------------+--------+----+------------+-------+----------------+---------+------------------------+----------------------+---------------+------------+------------+--------+----+------+-------+----------------+---------------+------------------+-----------------+-----------+---------------
(0 rows)
hancush commented 5 years ago

@fgregg making good progress here.

can you say more about:

?

hancush commented 5 years ago

re: add order_by to meta class on EventAgendaItem, there's already clean_agenda_items which orders and dedupes agenda items.

https://github.com/datamade/django-councilmatic/blob/8d64575cf93470d13db99db9f4d42afd3a67cc5c/councilmatic_core/models.py#L711-L721

we already use this property to display agenda items. should be able to continue using with minor changes, e.g., i think it's event.agenda to get the agenda items (not event.agenda_items).

fgregg commented 5 years ago

if you add ordering to the model class, it will, by default return querysets in that order. This is just an opportunity to clean up some code, and not required for this migration.

https://docs.djangoproject.com/en/2.2/ref/models/options/#ordering

hancush commented 5 years ago

Change log

Description

tl;dr - Remove the need for extensive and error-prone ETL by basing the Councilmatic app off Open Civic Data models, i.e., a jurisdiction-specific database fed by the scrapers, rather than maintaining a remote database containing a set of similar, Councilmatic-specific models.

There is a Python package containing Django versions of the OCD models. The relevant ones live in opencivicdata.core.models and opencivicdata.legislative.models.

Further reading:

Approach

Extend Open Civic Data models

Subclassing

Leverage multi-table inheritance to add additional fields to OCD models. The primary use case in the refactor is adding slugs to first-class models – Person, Event, Bill, and Organization - and adding a headshot to Person.

Since the scrapers create OCD objects, we added signals to each of the subclassed models to create the related Councilmatic model on save. Read more on Django signals. »

We have also added a management command to import headshot files.

Proxying

We make extensive use of proxy models to add custom managers and additional properties and methods to classes for use in Councilmatic code. This is an issue because the proxied models return OCD objects, not Councilmatic objects. To get around that, we used the hacky but effective django-proxy-overrides library to force models to return related objects as Councilmatic models, as needed.

To customize the model of related objects, proxy a model like opencivicdata.legislative.models.BillAction -> councilmatic_core.models.BillAction, and override one or more related object attributes with a ProxyForeignKey.

A brief summary of alternative approaches and why we ultimately did not choose to use them can be found here.

Related changes

Upgrade to Django 2

Repair and extend the tests

Remaining changes