cal-itp / reports

GTFS data quality reports for California transit providers
https://reports.calitp.org
GNU Affero General Public License v3.0
7 stars 0 forks source link

Add schedule compliance checks to reports site #256

Closed owades closed 1 year ago

owades commented 1 year ago

Description

This change updates the reports site to replace the file-presence checks with GTFS Schedule compliance checks instead.

The check logic is simplified slightly for this purpose, will all checks being marked as either:

I will need to think through the impact of this a bit more (with Laurie's help, surely), to see whether we need to break this into more categories or describe better.

Type of change

How has this been tested?

So far I've just been spot checking on a few organizations. I expect we'll do more detailed QA prior to launching (hopefully this can be next week).

Screenshots (optional)

This is AC Transit for January '23 Screen Shot 2023-05-24 at 5 13 52 PM

edasmalchi commented 1 year ago

I think that we only generate reports for organizations that have an assessed schedule dataset (some of the slightly gnarly logic in here). So I think that every organization with a report will pass the "has GTFS Schedule data" check by definition. Do we want to change that logic and include organizations without schedule data? Or do we want to consider not displaying that check because it is kind of redundant? No worries if we want to keep it how it is now, this may be fine but I just wanted to note it.

Let's keep it! After all as we add more checks there will likely be more red showing up, might as well make viewers feels a sense of accomplishment with a "free" green check

To Laurie's point on the manual checks, I think a "date assessed" might make things a little cluttered but perhaps we could note that those checks are infrequent somehow (like with an asterisk and a note further down like "assessed manually about every n months, contact us with any questions")

Agreed on the mart table upstream, let's do that

owades commented 1 year ago

Thank you @lauriemerrell, and you are an essential reviewer! I just hadn't gotten around to tagging folks beyond the auto-tag first. I agree with Eric's replies above and will follow through on the suggestion to explain check frequency.

I think that this simplification sounds good, my only quibble is that I think it would be preferable to add this as a column in the actual mart table (maybe it could be called reports_status), just so that for troubleshooting purposes we have a very easy way to check what we think the status will be on the reports site. (One of the goals of the v2 rewrite for the reports site was to have minimal data-related logic in the reports generation code and try to have it all in the warehouse where it's easier to check, for example you can just see it all in Metabase in advance without having to run reports generation code.)

I totally agree, thanks for suggesting this. I created this ticket as a followup there. I still need to work out a dbt issue, hopefully will get it sorted out this week and can draft a PR. One nice thing about the existing setup is that we can technically make this change after going live, since the outputs of fct_monthly_reports_site_organization_guideline_checks don't need to change. It would be best to do this before going live, though.

lauriemerrell commented 1 year ago

That sounds good, thanks @owades.

Just re:

since the outputs of fct_monthly_reports_site_organization_guideline_checks don't need to change.

Technically I think it would be ideal if the reports generation code looked at reports_status once that column exists, but it's not a huge deal.

owades commented 1 year ago

@lauriemerrell, do you know/remember if we are tracking which checks are manual versus automated? A binary value of "is_manual" could control something like an astrix here.

In the case where we didn't have a note about the check frequency, a complaint from an organization about their status would trigger us re-checking them and updating the value. That doesn't seem horrible to me. @edasmalchi

To Laurie's point on the manual checks, I think a "date assessed" might make things a little cluttered but perhaps we could note that those checks are infrequent somehow (like with an asterisk and a note further down like "assessed manually about every n months, contact us with any questions")

lauriemerrell commented 1 year ago

@owades I don't think we officially have an is_manual, but it wouldn't be too hard to add. I think the easiest way to do so would be to add a seed file that lists the manual checks, and then we could join it in when needed. We could also just add it as a column in the reports site checks mart table and basically be like check in (list of manual checks) AS is_infrequent.

edasmalchi commented 1 year ago

@owades I don't think we officially have an is_manual, but it wouldn't be too hard to add. I think the easiest way to do so would be to add a seed file that lists the manual checks, and then we could join it in when needed. We could also just add it as a column in the reports site checks mart table and basically be like check in (list of manual checks) AS is_infrequent.

That sounds good, do we think there's capacity to do that before merging this or should we hardcode into the report for now and revisit when that's in the warehouse?

lauriemerrell commented 1 year ago

Seems like it could make sense to hard-code it here, do a follow-up PR in data-infra with both that change and the reports_status change, and then make a cleanup PR back here to use the new functionality from data-infra. But defer to y'all, also may depend if @owades has capacity to do this himself or whether you'd want help from Data Services.

github-actions[bot] commented 1 year ago

Preview url: https://PR-256--cal-itp-reports.netlify.app

owades commented 1 year ago

Here's a screenshot of how it looks with the latest updates. Since taking this pic I've fixed the spelling of "asterisk". Screen Shot 2023-05-31 at 11 48 04 AM

owades commented 1 year ago

@edasmalchi @lauriemerrell @charlie-costanzo I've made the updates we discussed last week, would love another review from y'all!

Eric, I want to make sure you review the copy, I really defer to you on that.

edasmalchi commented 1 year ago

the copy (and everything else) looks good to me!

owades commented 1 year ago

Thank you @charlie-costanzo! I just updated the PR, could you take a look? I first ran this file and saw what a failing validation looks like, and then updated the scheme to match what I expect. It now runs without any output, which I interpret to mean it passes.

Hey Scott, thanks for taking care of this! Generating the data and building the site both seem to be working locally - my only request would be to update the schema template used by the test python test_report_data.py which checks output schema, described here in the readme. The file is found here. Let me know if you're not able to do that before you go OOO or if you have any questions. Thanks again!