Metro-Records / la-metro-councilmatic

:metro: An instance of councilmatic for LA Metro
MIT License
6 stars 2 forks source link

Board Reports: Unpublished reports showing on website! #345

Closed shrayshray closed 5 years ago

shrayshray commented 6 years ago

A bunch of reports which are not yet published are showing up on the site. See the first 8 reports listed: https://boardagendas.metro.net/search/ E.g., 2018-0435: https://boardagendas.metro.net/board-report/2018-0435/ Status in Legistar is “Agenda Ready”. It is NOT available on metro.legistar.com. The Agenda has not yet been published and is not available: https://boardagendas.metro.net/event/executive-management-committee-8000f2384368/

shrayshray commented 6 years ago

Follow up on this issue ... thank you so much, @reginafcompton, for resolving it right away. The reports which were showing up in advance of their agenda being published (they are on agendas for 9/19 and 9/20) had been manually changed in Legistar by unchecking the box "Not Viewable Via InSite".

Six of the 8 reports were General Public Comment, which users previously manually set to "Not Viewable ...", and then unchecked weeks/months later when it was decided the policy would be to show them. So for this month's General Public Comment reports, the users got mixed up and assumed they always needed to manually uncheck the box for these reports. The other two reports were changed within about 10 minutes on the same day, so I'm assuming the user just went with the flow he/she got into with this process.

So this was a workflow issue on the Metro side ... users not clear that the box is checked by default before the report's agenda is published, but it will be viewable once the agenda is published - unless the box is checked after publication (and we alert Datamade to this change). We're clarifying the appropriate workflow with our users. Here's what I'm wondering: Would it be possible for the Councilmatic site to follow the same logic for viewable/hidden as InSite, and first check whether the report's agenda has been published? These reports were not visible on InSite while they were visible on the Councilmatic site, because InSite is looking at whether the reports are on a published agenda.

reginafcompton commented 6 years ago

Thanks for this detailed report @shrayshray. I have a couple questions before suggesting a solution:

  1. Can you tell me about the MatterAgendaDate for a Board Report on the Legistar API (e.g., http://webapi.legistar.com/v1/metro/matters/5204)? Specifically, does a Board Report have a MatterAgendaDate, only when the agenda has been published?

  2. It looks like the MatterStatusName can be "Agenda Ready", even when the agenda is not published. Can you confirm? (I was looking at this upcoming event, which contains reports listed as "Agenda Ready," though the Agenda is still a draft.)

shrayshray commented 6 years ago

@reginafcompton 1. No, Board Reports are assigned Agenda Dates during the drafting process, not when the agenda is published.

  1. Yes, once Board Reports are ready to be published the status is changed to "Agenda Ready". It's like a cue the drafting and approval process is complete and the Report is ready to be on an Agenda.
shrayshray commented 6 years ago

The logic we discussed for consistent treatment of reports and PDF rendering is to:

Check whether "Not Viewable via InSite" is True or False. If True, stop/do not display. If False,
Check report type (this step becomes necessary once the archive of pre-2015 board documents and Board Boxes is added to Legistar). If "Board Box", display report and PDF. If False,
Check whether the report is on a published agenda. If true, display report and PDF. If false, stop/do not display.
reginafcompton commented 5 years ago

Breaking down the various steps:

reginafcompton commented 5 years ago

@shrayshray - I've implemented the logic suggested above. I'd like to deploy it to the staging site and test that it works as expected. For this, could you add a few test bills, ideally one for each case? That being:

Let me know a good date for testing!

shrayshray commented 5 years ago

@reginafcompton I'm working on 3 out of 4 of these and will hopefully have them available for you tomorrow. But regarding the 2nd item, "Board Box" report, we currently do not have any of these in Legistar - they're stored in the Board Archive and in the dedicated Board Box Archive and will not be available in Legistar until we migrate the Board Archive into the system. Edit: To clarify, Board Boxes are not drafted in Legistar. Because they're currently drafted manually, we store them in the Board Archive.

reginafcompton commented 5 years ago

Here are my expectations for the three cases we plan to test:

I am using the staging site as the test arena.

@shrayshray - after you add the three bills to Legistar and the scraper and import run, I'll need to rebuild the Solr index. Just let me know when you get started!

shrayshray commented 5 years ago

@reginafcompton I'm about to get started. But first, to clarify, the workflow error was that the report should be viewable, but it should be the current version -- meaning, it reflects the edits made in the time between it was on the original agenda, which was cancelled, and the new agenda. The issue was the report showing on the site was in the state in which it appeared originally and did not reflect updates made in the meantime.

reginafcompton commented 5 years ago

@shrayshray - I believe you might be thinking of https://github.com/datamade/la-metro-councilmatic/issues/347. That's a different issue handled by the new cache-refresh script.

For this issue, we are just testing whether or not board reports are hidden vs. not hidden.

Does that sound right?

shrayshray commented 5 years ago

@reginafcompton got it. So you want me to stop the workflow reproduction after the meeting is cancelled, and not continue on and add it to a new meeting?

reginafcompton commented 5 years ago

I think the above workflow would go something like:

(1) you create a General comment report that is on an UNPUBLISHED agenda, and you check "Not Viewable Via InSite" (2) then, I'll run the scrape and import - nothing should get scraped! (3) then, you uncheck "Not Viewable Via InSite" (but do not publish the agenda). (4) then, I'll run the scrape and import - the report should be scraped, but not viewable on Councilmatic!

Does that sound okay?

shrayshray commented 5 years ago

Sorry for the delay! 1 is complete. Meeting is Planning and Programming Committee, 3/21/19 Report is 2018-0747

reginafcompton commented 5 years ago

@shrayshray - great! I confirmed that the scraper skips the bill. Query of OCD API.

Let's try move on to 3 and 4.

shrayshray commented 5 years ago

Okay, just let me know when you're ready for me to move on to Item 3!

reginafcompton commented 5 years ago

I am! Did you uncheck "Not Viewable Via InSite"? If so, I'll run the scraper.

shrayshray commented 5 years ago

Yes, just unchecked it.

reginafcompton commented 5 years ago

Great, I ran the scraper, and it pulled the bill into the OCD API. I then executed import_data, which added the bill to the Councilmatic database. Then, I updated the Solr index, and behold! The bill is not there.

https://lametro.datamade.us/search/?q=%222018-0747%22&search-all=on

@shrayshray - the system seems to be working as expected. In fact, with this example, we handled all the cases I wanted to test. How do you feel? Do you have any questions?

shrayshray commented 5 years ago

Great! Did you want to test on a published Agenda? Or checking/unchecking "Not Viewable Via Insite" on reports not on an Agenda, but with the status of Agenda Ready? Or have we covered these already?

shrayshray commented 5 years ago

@reginafcompton Could you please ensure the test meetings are removed from the Councilmatic site? There are 2:

  1. Planning and Programming Committee, 3/21/19 -- location = "test" (it was my understanding "test" in location would prevent it from showing on the site ...)
  2. Planning and Programming Committee, 3/21/19 -- location = "1 Gateway Plaza ..." this one was my mistake ... the Legistar interface didn't update to show it was created, and I thought I'd maybe forgotten to save it, so I created the second meeting.

The Planning and Programming Committee meeting on the 3/20/19 is the actual meeting that month and should remain visible.

reginafcompton commented 5 years ago

@shrayshray yes, I will handle these shortly!

reginafcompton commented 5 years ago

All right - stray items have been removed from the site. I think we might want to test the example you mention above:

Checking/unchecking "Not Viewable Via Insite" on reports not on an Agenda, but with the status of Agenda Ready

In both cases, the report should NOT appear on the site. When would be a good time to try that out?

shrayshray commented 5 years ago

@reginafcompton Could you take a look at reports 2018-0749 and 2018-0750? These reports are showing up on the site, but the Agenda they're going to be on is not published yet. They do NOT have the "Not Viewable via InSite" box checked; but the logic we established should hide them based on the Agenda not being published/public yet.

reginafcompton commented 5 years ago

@shrayshray - I did not deploy the code with our system of "checks" to production (since we were still testing it!). It looks like the system works, however – since the reports-in-question are not visible on the staging site (e.g., https://lametro.datamade.us/search/?q=%222018-0750%22&search-all=on)

I can go ahead and deploy the new system to production.

shrayshray commented 5 years ago

@reginafcompton yes, please deploy -- thank you!

reginafcompton commented 5 years ago

It's deployed!

shrayshray commented 5 years ago

@reginafcompton Thank you! I did find one report which looks like it slipped through the cracks of the new logic: 2018-0513. It was on a (published) committee agenda previously, but has since been marked "Not Viewable Via InSite" in Legistar.

reginafcompton commented 5 years ago

@shrayshray - excuse the delay in responding. It appears that this bill continues to appear on agendas for two events in the Legistar API:

http://webapi.legistar.com/v1/metro/events/1389/eventitems (n.b. OCD API) http://webapi.legistar.com/v1/metro/events/1491/eventitems (n.b. OCD API)

As long as the bill remains on those agendas, it will be rendered in Councilmatic.

If you remove the agenda items in Legistar, let me know, and we'll take a look at the Councilmatic data, again. If not, it seems like we can close this issue!

shrayshray commented 5 years ago

@reginafcompton I think this is still an issue, though it's resolved for the bill in question, 2018-0513 -- it was it was on a Committee agenda, then after the meeting someone checked "Not Viewable Via Insite" while making revisions to it. After the revisions were complete, "Not Viewable Via Insite" was unchecked and it was added to the agenda for a Regular Board Meeting.

The problem was it appeared on the Councilmatic site while the "Not Viewable Via Insite" box was checked. It wasn't appearing on metro.legistar.com at that time. Isn't the first step for the scraper to check whether "Not Viewable via InSite" is True or False, and to not scrape if True?

reginafcompton commented 5 years ago

Okay, I see. We have a tricky edge case here! I'll summarize why the Bill made it through the cracks.

  1. The scraper had already scraped the bill. Even though it was not visible on Legistar, the scraper did not have a mechanism for removing it from our database. We've raised this issue in the past.
  2. In the Councilmatic data system, the bill remained an agenda item on the Committee meeting. This raises some important points.

We want to fully insure that "hidden" bills do not appear on the Councilmatic interface. I can think of several not entirely optimal solutions:

ONE ping Legistar when we import agenda items; this would entail running the import with an update_since timestamp for the last 24 hours (can we do that in cron?), so that we get the most up-to-date event data (n.b. the event scraper scrapes all events every night). TWO ping Legistar in is_viewable. This would add load time to the search page, and it would assume that Legistar is up-and-running (that's not a great dependency to have).

r = requests.head(self.source_url)
if r.status_code == 200:
    return True

THREE ping Legistar for 20 (or less) bills that get rendered in the search view. This would limit load time, but would, again, add a risky dependency with Legistar.

@shrayshray , could think about this a bit more, before landing on a solution?

shrayshray commented 5 years ago

@reginafcompton of course, think it out! This is of very highest priority to resolve, but I understand it's complicated and takes time.

reginafcompton commented 5 years ago

@shrayshray - we have a fix in place for this! Here's what's new.

The scraper now grabs all private bills (i.e., bills with MatterRestrictViewViaWeb set to true). The scrape, however, captures very limited information about these bills, so that our OpenCivicData api does not expose private data. The significant data points are: the timestamp and the value of MatterRestrictViewViaWeb.

We then import these bills to the Metro database, but the view logic hides them.

In other words, the scraper helps us keep track of which bills are public or private and, hence, whether we should show or hide a bill.

Example

For example, report 2018-0660 is currently marked as private.

You can see its full title and other info in the Metro API (by including the token): https://webapi.legistar.com/v1/metro/matters/5391?token=SECRET_TOKEN

Conversely, its entry in the API omits any consequential detail.

Finally, a search for this report on Councilmatic returns zero results.


Can you let me know if you have any questions? and when/if you feel ready to close this issue?