Closed edasmalchi closed 2 years ago
Research Journey:
calitp_id
170 is missing from the views.gtfs_schedule_fact_daily_trips
. The calitp_extracted_at/deleted_At
columns are null for all rows. gtfs_schedule_fact_daily_feeds
and verified that for this test date of 2021-10-30 the extraction was success. extracted_at/deleted_at
column is populated in the gtfs_schedule_fact_daily_trips
, it appears to be a subtable daily_service_trips
where stg_daily_service_keyed is joined to views.gtfs_schedule_dim_trips where calitp_id_extrated at is <= service date
and calitp_deleted_at > service_date
.calitp_extracted_at/deleted_at
are populated correctly in gtfs_schedule_dim_trips
. Both tables have the data type of date and there are no leading/lagging white spaces I can discern from the JSON output. gtfs_Schedule_type2.calendar_clean
which does not have a calitp_itp_id of 170, but it is in CALENDAR DATES....calendar_dates
and calendar_dates_clean
but is not in calendar
or calendar_clean
For the purposes of this ticket I then started to research calitp_id 16, 48, 208, 372:
So it seems for calitp_id 170, 16, and 48 we are recieving calendar_dates.txt rather than calendar.txt which populates calitp_id_extracted and calitp_id_deleted at as null in gtfs_schedule_fact_daily_trips
After discussing at the standup, per the gtfs guidelines, agencies are only required to send either the calendar.txt or calendar_dates.txt. https://developers.google.com/transit/gtfs/reference#calendartxt so it is something we should handle on our end. So looking deeper into the select statements in the date_includion from the calendar_dates, we do no select calitp_extracted_at/deleted_at as part of the data to include so it is not present on the right side of the join which means that it does not get populated in the full join lower down. So by changing adding those to the above subtable then 170 gets populated
Describe the bug When filtering for Long Beach Transit (itp id 170),
calitp_extracted_at
andcalitp_deleted_at
are empty in gtfs_schedule_fact_daily_tripsTo Reproduce Steps to reproduce the behavior:
calitp_extracted_at
andcalitp_deleted_at
tbl.views.gtfs_schedule_fact_daily_trips() >> filter(_.calitp_itp_id == 170)
Expected behavior
calitp_extracted_at
andcalitp_deleted_at
data present, which is helpful for analysts to run faster/smaller queries on a time period of interest.Additional context itp_ids 16, 48, 208, and 372 may have a similar issue.