Closed evansiroky closed 1 month ago
I think this is because an issue an airtable can have multiple gtfs records (or services) attached to it. e.g. transit issue 1
has 5 duplicates but also 5 gtfs records attached to it.
2 of sets of duplicates are due to having multiple services attached to it. It could also be because of issue_type but i'm not seeing that happening because airtable restricts you to only one issue type.
This happens during the unnest step in this table.
Solutions:
I think 1. is easiest, and preserves the point of this table and the issues.
Separately I added code so the name of the gtfs file and service ends up in the final table.
In a call @evansiroky said choice 1 is fine.
Describe the bug
I believe that the
fct_transit_data_quality_issues
table should only ever have one entry per Airtable record. However, there are times when multiple records are showing up in thefct_transit_data_quality_issues
table. This appears to happen whenever there is a change in either the GTFS Dataset or Service associated with the Airtable issue.To Reproduce
See metabase question: https://dashboards.calitp.org/question/2516-fct-transit-data-quality-issues-with-duplicate-entries
Expected behavior
Each Airtable record should show up once.
Additional context
This is causing incorrect statistics to show up in the Transit Data Quality Issue Dashboard.