cal-itp / data-infra

Cal-ITP data infrastructure
https://docs.calitp.org/data-infra
GNU Affero General Public License v3.0
47 stars 12 forks source link

Deciding what to do for reporting agencies with multiple feeds / duplicate feeds #154

Closed machow closed 3 years ago

machow commented 3 years ago

cc @e-lo @mcplanner @hunterowens

Problem

We will be generating reports and communicating with agencies about their feeds. However, there are at least two edge cases to consider..

How should we generate reports in these cases? My sense is that in terms of effort...

I'd imagine that if it only impacts a couple feeds, then the easy approach should be okay? And unless it's a very common issue or high impact, then would rule the hard approach out?

Copying @e-lo's comment from #146:

Rereading...I think I just don't understand the current proposal in its entirety. Translating above to LA Metro case where feeds are complimentary c.t. AC Transit where they are duplicative:

la-metro
  agency_name: Los Angeles Metro
  feeds:
      - gtfs_schedule_url: #...RAIL FEED
        gtfs_rt_vehicle_position_url: #...
        gtfs_rt_trip_updates_url: #...
        gtfs_rt_service_alerts_url: #...
      - gtfs_schedule_url: #.. BUS FEED
        gtfs_rt_vehicle_position_url: #...
        gtfs_rt_trip_updates_url: #...
        gtfs_rt_service_alerts_url: #...
ac-transit
  agency_name: AC Transit
  feeds:
      - gtfs_schedule_url: #...feed from MTC/511
        gtfs_rt_vehicle_position_url: #...
        gtfs_rt_trip_updates_url: #...
        gtfs_rt_service_alerts_url: #...
      - gtfs_schedule_url: #.....feed published by AC Transit (duplicative of 511 feed)
  1. Would we generate separate reports for each feed in the list and then give reports/grades on ALL of them to the agencies ?
  2. Or should we not even list the feed published by AC transit itself? (not sure what the immediate ramifications are of this, but there could be some in the long-term if AC decides to publish fields/extensions that MTC doesn't...probably not a huge risk)
hunterowens commented 3 years ago

the easy sounds good to me, it's the right approach for every agency but la metro, but since they represent it as two feeds, I think they can get two reports.

On Wed, Jun 9, 2021 at 9:56 AM Michael Chow @.***> wrote:

cc @e-lo https://github.com/e-lo @mcplanner https://github.com/mcplanner @hunterowens https://github.com/hunterowens Problem

We will be generating reports and communicating with agencies about their feeds. However, there are at least two edge cases to consider..

  • Complimentary feeds. Agencies like LA Metro that split their rail and bus data into two feeds
  • Duplicative feeds. Agencies that might have multiple feeds with identical info.

How should we generate reports in these cases? My sense is that in terms of effort...

  • easy: generate report for each feed. Agencies with multiple feeds deal with 1 email or report link per feed.
  • medium: mark duplicative feeds, and only generate 1 report for them. Agencies w/ complimentary feeds still get multiple reports.
  • hard: aggregate complimentary feeds into a single report. Could be by representing them as a single feed in the data (would prefer not to, since we'd have to check that it doesn't screw up joining data), or aggregating metrics by removing duplicative feeds and grouping by itp_id.

I'd imagine that if it only impacts a couple feeds, then the easy approach should be okay? And unless it's a very common issue, then would rule the hard approach out? Copying @e-lo https://github.com/e-lo's comment from #146 https://github.com/cal-itp/data-infra/issues/146:

Rereading...I think I just don't understand the current proposal in its entirety. Translating above to LA Metro case where feeds are complimentary c.t. AC Transit where they are duplicative:

la-metro agency_name: Los Angeles Metro feeds:

  • gtfs_schedule_url: #...RAIL FEED gtfs_rt_vehicle_position_url: #... gtfs_rt_trip_updates_url: #... gtfs_rt_service_alerts_url: #...
  • gtfs_schedule_url: #.. BUS FEED gtfs_rt_vehicle_position_url: #... gtfs_rt_trip_updates_url: #... gtfs_rt_service_alerts_url: #...ac-transit agency_name: AC Transit feeds:
  • gtfs_schedule_url: #...feed from MTC/511 gtfs_rt_vehicle_position_url: #... gtfs_rt_trip_updates_url: #... gtfs_rt_service_alerts_url: #...
  • gtfs_schedule_url: #.....feed published by AC Transit (duplicative of 511 feed)

    1. Would we generate separate reports for each feed in the list and then give reports/grades on ALL of them to the agencies ?
    2. Or should we not even list the feed published by AC transit itself? (not sure what the immediate ramifications are of this, but there could be some in the long-term if AC decides to publish fields/extensions that MTC doesn't...probably not a huge risk)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cal-itp/data-infra/issues/154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHXYURUBZNJZQNWLDZVI3TR6MLNANCNFSM46MNUCVQ .