Metro-Records / la-metro-councilmatic

:metro: An instance of councilmatic for LA Metro
MIT License
6 stars 2 forks source link

"Ver en español" link not visible #393

Open reginafcompton opened 5 years ago

reginafcompton commented 5 years ago

During the January 24 board meeting, the "Ver en español" link was not visible.

The problem originated in the scraper, which did not pair the Spanish and English events. It's not immediately clear why this occurred. Here's what we know:

Is it possible that nightly scrape does not behave as expected, and the windowed scrape for events behaves differently than that?

Deferring to the scraper expertise of @hancush on this one!

hancush commented 5 years ago

I am wondering if this is related to (or would have been caught by) https://github.com/opencivicdata/scrapers-us-municipal/pull/245, however the date and time matched in this instance, so it's not directly analogous...

jmithani commented 5 years ago

Diagnosis

Running Hannah's scraper PR linked above, there is currently one unmatched Spanish event.

unpaired_events
"{'Name': {'label': 'Board of Directors - Regular Board Meeting (SAP)', 'url': 'https://metro.legistar.com/DepartmentDetail.aspx?ID=37212&GUID=41A3FBF5-236F-4B6A-817C-FF59782DC0A0'}, 'Meeting Date': '9/27/2018', 'iCalendar': {'url': 'https://metro.legistar.com/View.ashx?M=IC&ID=640105&GUID=96793504-41F6-4027-9BA4-168280ADD86A'}, 'Meeting Time': '9:00 AM', 'Meeting Location': 'One Gateway Plaza, Los Angeles, CA 90012, \\r\\n3rd Floor, Metro Board Room', 'Meeting Details': 'Meeting\\xa0details', 'Agenda': 'Not\\xa0available', 'Recap/Minutes': 'Not\\xa0available', 'Audio': {'label': 'Audio', 'url': 'https://metro.legistar.com//Video.aspx?Mode=Granicus&ID1=944&Mode2=Video'}, 'eComment': 'Not\\xa0available'}"
Screen Shot 2019-06-14 at 3 12 27 PM

Looking at the English version on Metro's site, the meeting actually started at 9:30. _(Looking at the page for the unpaired_events in Legistar, there is a notice it was removed, but I'm not sure the URL is correct.)_

English/Spanish events are matched by combination of Committee, Date, and Time. Since the Time was mismatched, the events weren't paired.

How does that apply to the original problem?

For the problem documented by Regina, there could have been a Spanish event with an incorrect Committee/Date/Time made, and then deleted and replaced by another that the scraper didn't pick up until it scraped a wider window.

Solutions

Question: Is matching by Time necessary? Has there ever been a conflict, or is it a precautionary measure?

Once this is answered, we can see if we can change the key to match on to be just committee and date, or we will need to identify a third field to link them.

Idea: see if the ID for the audio is always sequential (e.g. http://metro.granicus.com/MediaPlayer.php?view_id=2&clip_id=944 and http://metro.granicus.com/MediaPlayer.php?view_id=2&clip_id=943), then implement a check to see if three of four conditions are met, which would result in matching two events.

hancush commented 5 years ago

@jmithani thank you for your detailed diagnosis!

re: your solution, that's a great question. when the first incident occurred, shelly indicated she'd see whether the board secretary's office had feedback about using name, date, and time to pair events. want to follow up on that point? checking for three of four seems a little bit unwieldy, however if checking for all three is indeed necessary, i think it would be reasonable to ask metro to update the time of the event breaking the logic.

returning to the instance that prompted this issue, it seems like the english and spanish events do match on all three fields, so i'm not sure that the assertion in https://github.com/opencivicdata/scrapers-us-municipal/pull/245 would have caught this problem. in the scraper, we tried to implement logic so it's ok if an english event is not paired, but all spanish events should be paired. maybe some events are getting lost in that control flow? in a new pr, could you add some logging to the events scraper in places where we allowing events to go unpaired? here, for example.

jmithani commented 5 years ago

We're waiting to hear back from Metro (cc: @shrayshray) about whether there are multiple meetings for the same committee on the same day. If not, we will change the logic to pair events to rely only on committee name and date, resolving one cause of this issue.

We're still investigating the issues on the January 24th meeting. In the future, it is possible that a webcast can begin, stop, then restart. @hancush and I will proactively look into how this would affect the scraper, but more information is likely needed before we can have a definitive answer.

jmithani commented 5 years ago

We are also implementing more robust logging that will give us more information about instances where events go unpaired, for quicker diagnosis and resolution of the problem.

shrayshray commented 5 years ago

@jmithani I just got the response we've been waiting for: "I don’t see any reason to use the meeting times if the meeting name and date are sufficient. Meetings in both languages will always convene at the same time."

jmithani commented 5 years ago

Next week (to not interfere with tomorrow's board meeting), we will merge in two changes in the scraper to address this issue.

First, we will remove time as a necessity for matching English/Spanish events, as that is a known cause of errors when searching for pairs. Second, we have added in checks and logging to catch any unmatched events. This proactively tells us if there are unmatched events (versus finding that the Spanish broadcast link doesn't appear), so we can more immediately solve the problem.

jmithani commented 5 years ago

The changes for this have been merged. https://github.com/opencivicdata/scrapers-us-municipal/pull/285 https://github.com/opencivicdata/scrapers-us-municipal/pull/284

Is this issue ready to be closed @shrayshray? Or would you prefer to keep it open through the next Board meeting?

shrayshray commented 5 years ago

@jmithani let's leave it open until after the next Board meeting.

hancush commented 4 years ago

looks like our logging caught one instance where a spanish event couldn't be paired with an english event over the weekend: https://sentry.io/organizations/datamade/issues/1224133462/?project=56420.

n.b., this issue appears to have resolved itself.

it looks like the the implicated spanish event was updated friday:

<GranicusEvent xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/LegistarWebAPI.Models.v1">
<EventAgendaFile i:nil="true"/>
<EventAgendaLastPublishedUTC i:nil="true"/>
<EventAgendaStatusId>9</EventAgendaStatusId>
<EventAgendaStatusName>Draft</EventAgendaStatusName>
<EventBodyId>231</EventBodyId>
<EventBodyName>Board of Directors - Regular Board Meeting (SAP)</EventBodyName>
<EventComment i:nil="true"/>
<EventDate>2019-09-26T00:00:00</EventDate>
<EventGuid>2DF55036-BC57-4EF6-A990-C83DC2CF9E53</EventGuid>
<EventId>1620</EventId>
<EventInSiteURL>
https://metro.legistar.com/MeetingDetail.aspx?LEGID=1620&GID=557&G=A5FAA737-A54D-4A6C-B1E8-FF70F765FA94
</EventInSiteURL>
<EventItems/>
<EventLastModifiedUtc>2019-09-13T22:56:53.35</EventLastModifiedUtc>
<EventLocation>
One Gateway Plaza, Los Angeles, CA 90012, 3rd Floor, Metro Board Room
</EventLocation>
<EventMinutesFile i:nil="true"/>
<EventMinutesLastPublishedUTC i:nil="true"/>
<EventMinutesStatusId>9</EventMinutesStatusId>
<EventMinutesStatusName>Draft</EventMinutesStatusName>
<EventRowVersion>AAAAAAD7Tdo=</EventRowVersion>
<EventTime>10:00 AM</EventTime>
<EventVideoPath i:nil="true"/>
<EventVideoStatus>Public</EventVideoStatus>
<style id="stylish-2" class="stylish" type="text/css">
div[aria-label="Timeline: Your Home Timeline"] { display: none; }
</style>
</GranicusEvent>

but the english event hasn't been updated since may.

GranicusEvent xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.datacontract.org/2004/07/LegistarWebAPI.Models.v1">
<EventAgendaFile i:nil="true"/>
<EventAgendaLastPublishedUTC i:nil="true"/>
<EventAgendaStatusId>9</EventAgendaStatusId>
<EventAgendaStatusName>Draft</EventAgendaStatusName>
<EventBodyId>138</EventBodyId>
<EventBodyName>Board of Directors - Regular Board Meeting</EventBodyName>
<EventComment i:nil="true"/>
<EventDate>2019-09-26T00:00:00</EventDate>
<EventGuid>525B74F3-7C7F-48C3-AC95-83337CC32068</EventGuid>
<EventId>1566</EventId>
<EventInSiteURL>
https://metro.legistar.com/MeetingDetail.aspx?LEGID=1566&GID=557&G=A5FAA737-A54D-4A6C-B1E8-FF70F765FA94
</EventInSiteURL>
<EventItems/>
<EventLastModifiedUtc>2019-05-28T16:54:24.023</EventLastModifiedUtc>
<EventLocation>
One Gateway Plaza, Los Angeles, CA 90012, 3rd Floor, Metro Board Room
</EventLocation>
<EventMinutesFile i:nil="true"/>
<EventMinutesLastPublishedUTC i:nil="true"/>
<EventMinutesStatusId>9</EventMinutesStatusId>
<EventMinutesStatusName>Draft</EventMinutesStatusName>
<EventRowVersion>AAAAAADwyPA=</EventRowVersion>
<EventTime>10:00 AM</EventTime>
<EventVideoPath i:nil="true"/>
<EventVideoStatus>Public</EventVideoStatus>
<style id="stylish-2" class="stylish" type="text/css">
div[aria-label="Timeline: Your Home Timeline"] { display: none; }
</style>
</GranicusEvent>

so, it seems like the spanish event was captured in a windowed scrape. some possible explanations for why it was unpaired:

a good next step would be for us to add tests for these cases to our scraper.

again, although this has not always been the case in the past, the issue seems to have resolved itself this time. (no further scrape failures, and both english and spanish links represented in the ocd api: https://ocd.datamade.us/ocd-event/b637da42-f1e2-4687-9456-69269066ab15/.)

hancush commented 4 years ago

Eureka! We got another alert of unpaired Spanish events yesterday, so I set aside some time to delve into this. There was a very subtle bug that prevented the scraper from recognizing partial scrapes, i.e., it never tried to find the corresponding English event for unpaired Spanish events picked up in windowed scrapes.

I repaired the bug in a pair of pull requests (https://github.com/opencivicdata/scrapers-us-municipal/pull/304 and https://github.com/opencivicdata/python-legistar-scraper/pull/100) and deployed the changes.

I'll monitor through Thursday's Board meeting. If everything goes to plan, I think it's safe to close this, @shrayshray!

shrayshray commented 4 years ago

Though I believe pull #309 is intended to fix this in the long run, I just wanted to note here we had another instance of an event with unpaired SAP causing problems. Agendas not posting due to an unpaired event on 1/10/2020 image

hancush commented 4 years ago

Well spotted, @shrayshray! Forgot to update this thread. We found that there were some places in the scraper code that was still using start time to pair English and Spanish events. I completely removed the constraint and added a test to verify that events where the start time doesn’t match would be paired in https://github.com/opencivicdata/scrapers-us-municipal/pull/309, which was merged in Jan. 14, the week after those events went unpaired.