Metro-Records / la-metro-councilmatic

:metro: An instance of councilmatic for LA Metro
MIT License
6 stars 2 forks source link

BEFORE OCT 12 – pic.datamade.us board report is out of sync with Legistar #347

Closed hancush closed 6 years ago

hancush commented 6 years ago

Board report 2018-0140 was slated for a meeting during the summer but was postponed. While the link we have on file in the bill documents table resolves to the correct file in Legistar, the URL generated by the full_text_document_url template tag still points to the July version on pic.datamade.us.

hancush commented 6 years ago

FWIW, we have the correct version on file in our metro-pdf-merger S3 bucket.

hancush commented 6 years ago

Also, to be clear, the correct version of the report comes back when you download the report from the board report and event pages. The only place we are showing the wrong report, is the PDF pane on the board report page.

reginafcompton commented 6 years ago

Problem

In the OCD and Councilmatic database, the url for a Board Report points to the latest version of the pdf (generated here).

When Councilmatic needs to render a PDF, it visits https://pic.datamade.us/lametro/document/, where the property image cache does some work:

What's the issue? We use the Board Report's URL as the key, and this URL remains stable, even when the document that Legistar serves changes. The PIC would not know about such changes, since it already cached an earlier version.

Solutions

hancush commented 6 years ago

@reginafcompton would it be too hamfisted / difficult to connect services, to update the cached image when the bill changes?

reginafcompton commented 6 years ago

We need to ensure that the property-image-cache has the most up-to-date PDFs of board reports. An effective strategy for doing this: delete the old PDFs from the S3 bucket, whenever a bill gets updated (then, the document route will create a new entry in AWS, when someone visits a board report page on the Councilmatic site).

After consulting with @evz, a good solution entails devising a new management command that does the following:

  1. executes after import_data
  2. queries the Councilmatic database for newly updated bills – we can query the raw_billdocuments table. (n.b. This also contains new data, but that should not be an issue, since the delete function in S3 simply "does not remove any objects" if the bucket does not contain the specified key.)
  3. deletes the entries for those bills in the S3 bucket – possible with a single HTTP request
shrayshray commented 6 years ago

The logic we discussed for consistent treatment of reports and PDF rendering is to:

  1. Check whether "Not Viewable via InSite" is True or False. If True, stop/do not display. If False,
  2. Check report type (this step becomes necessary once the archive of pre-2015 board documents and Board Boxes is added to Legistar). If "Board Box", display report and PDF. If False,
  3. Check whether the report is on a published agenda. If true, display report and PDF. If false, stop/do not display.
reginafcompton commented 6 years ago

@shrayshray - Councilmatic now has a script that will refresh the document cache, every time a bill or event changes. We'll review this script, merge it, and add it to the data import pipeline early next week.

For the PDF rendering, I'll add the logic you note – though that seems like its more related to this issue: https://github.com/datamade/la-metro-councilmatic/issues/345. So, I'll keep track of any relevant updates there.

reginafcompton commented 6 years ago

@shrayshray - I've added the script for refreshing the document cache to the Metro data pipeline! Closing this issue.