datamade / django-councilmatic

:heartpulse: Django app providing core functions for *.councilmatic.org
http://councilmatic.org
MIT License
26 stars 16 forks source link

import_data: Consider adding a mechanism for handling stale EventMedia #239

Closed reginafcompton closed 1 year ago

reginafcompton commented 5 years ago

Recently, an incorrect/out-of-date EventMedia appeared in the Metro database: the Ad Hoc Congestion, Highway and Roads Committee had an media url for the Planning and Programming Committee.

id  |                               url                                |                    event_id                    |    note     |          updated_at
-----+------------------------------------------------------------------+------------------------------------------------+-------------+-------------------------------
 309 | http://metro.granicus.com/MediaPlayer.php?view_id=2&clip_id=1040 | ocd-event/07c53724-84b3-48e9-ae6c-89955e8b0c98 | Audio       | 2019-02-22 16:25:18.184621+00
 308 | http://metro.granicus.com/MediaPlayer.php?view_id=2&clip_id=1041 | ocd-event/07c53724-84b3-48e9-ae6c-89955e8b0c98 | Audio (SAP) | 2019-02-22 16:25:18.184621+00
 298 | http://metro.granicus.com/MediaPlayer.php?view_id=2&clip_id=1044 | ocd-event/07c53724-84b3-48e9-ae6c-89955e8b0c98 | Audio       | 2019-02-21 00:40:15.772434+00

It seems possible that the wrong URL was briefly ported to Legistar, the scrapers scraped it, and Councilmatic imported it. The scrapers iterate over all media when updating an event, so the OCD API ultimately had the correct data. However, the Councilmatic database did not. import_data does not have a mechanism for removing stale media urls.

Let's consider adding this functionality.

Related to #236