Metro-Records / la-metro-councilmatic

:metro: An instance of councilmatic for LA Metro
MIT License
6 stars 2 forks source link

refresh_pic cron job did not refresh board report #621

Closed fgregg closed 3 years ago

fgregg commented 4 years ago

This morning, an attachment was updated for the board report 2020-0380 and the board report was amended to mention the attachment.

According the the database, this last time this report was changed was four days ago. Because it appeared that report did not change, it was not picked up by the refresh_pic scheduled job.

If manually removed the PIC entry, and then reran compile_pdfs --all-documents, and everything was restored.

Possible mitigations

The last updated timestamps of bill texts and attachments are available through the API. We can set these fields in the scraper.

Then if the files are updated and the last modified field is properly set, then the report object will be updated and refresh_pic will work correctly.

That would not have worked in this case, because although @shrayshray says they updated report text this morning to mention the attachment, the last modified field says it was last touched 4 days ago.

hancush commented 4 years ago

Seems like we might benefit from similar logic here as we use in the scraper, i.e., refresh the PIC for any modified documents, as well as ones associated with future events, or that appear on the agenda for future events.

hancush commented 4 years ago

Just deployed a change that will update documents associated with recently updated board reports and events, as well as bill and event documents associated with upcoming events, e.g., documents with bills on the agenda. I'll keep this issue open to monitor for the June support cycle.

hancush commented 4 years ago

This happened again:

Hello, Datamade! Board Report 2020-0414 was revised at 10:48 this morning and the agenda it’s on was republished. The embedded agenda PDF and agenda page sidebar show this update, but the embedded PDF on the report page does not. Do you know what might cause this discrepancy, and can you make sure the embedded report updates? I attached the revised report for reference on what we expect to see on the report page.

derekeder commented 4 years ago

One thought here: When triaging this issue, I couldn't find a way to easily find and delete the cached document from the s3 bucket. I ended up modifying the pic code locally to delete it with boto.

Though this would be treating the symptom and not the cause, it could help to have an easier way to bust the cache for a specific document. One idea would be to pass a flag into the URL to force it to delete the file from S3 and re-fetch the original:

https://pic.datamade.us/lametro/document/?document_url=https%3A%2F%2Fmetro.legistar.com%2FViewReport.ashx%3FM%3DR%26N%3DTextL5%26GID%3D557%26ID%3D6856%26GUID%3DLATEST%26Title%3DBoard%2BReport&filename=agenda&cachebust=true

Think this would be worth doing?

hancush commented 4 years ago

@derekeder That's a really good thought.

The process to delete a document manually is outlined here: https://github.com/datamade/la-metro-councilmatic/issues/443

It has more steps than I like, but I'd rather use development time to figure out why the cache isn't updating.

derekeder commented 4 years ago

Cool - works for me

hancush commented 4 years ago

Hm, this old issue suggests that some browsers aggressively cache <iframe> elements: https://github.com/datamade/la-metro-councilmatic/issues/130.

We implemented a solution that uses the last updated timestamp to bust the browser cache for event documents, but we did not implement the same solution for board report pages. (We also know that the last updated timestamp is not a reliable indicator that something has changed!) I wonder if this was at play for the most recent issue?

Perhaps not, since removing the document from the cache resolved the issue. But noting, just in case!

hancush commented 4 years ago

Patched the latest bug in https://github.com/datamade/django-councilmatic/pull/268 and deployed the new version of django-councilmatic to production. Will monitor this.

@shrayshray – Can you let me know next time you update a board report (not an attachment, but a board report itself), so we can check that the cache is refreshed appropriately?