Closed evansiroky closed 2 years ago
Several of these are how we store the URIs // aren't actual changes. i.e. using the 511 feed to pull AC Transit using an API key vs not and using {{MTC_511_API_KEY}} vs inserting the actual API key.
~I'm not sure there are any differences other than these? Looking at the dashboard you made it looks like it is listing "everything" rather than the ones just with differences. I did a few spot checks of various other listings and they all seemed OK in both locations but I'd love to know if/when there are differences that we should care about!~
We believe that @e-lo happened to look at this dashboard during the time when the pipeline was still processing data and therefore the data was not correct. This bug is noted in #1064.
Another discrepancy is unicode (Airtable) vs ASCII substitutions (in agencies.yml
). Can we standardize around unicode which is a valid URL or is there a reason we wouldn't want to that?
Example (these both work and go to same place):
https://www.avta.com/userfiles/files/AVTA%20GTFS.zip
vs
https://www.avta.com/userfiles/files/AVTA_GTFS.zip
Sometimes URIs are case sensitive. Sometimes not.
I think we can either (a) standardize around lowercase - which would require editing a lot of URIs in both agencies.yml and airtable (b) resolve the discrepancies that currently exist and not worry about it until there is a problem (my preference given that it isn't super important)
Example (both of these work)
http://www.cleanairexpress.com/GTFS/GTFS.zip
vs
http://www.cleanairexpress.com/gtfs/gtfs.zip
Bay Area Ferries Schedule is same in airtable and agencies.yml - not sure why this is coming up.
Santa Maria Area Transit: They let us know that they aren't using Trillium anymore (the feed hasn't been updated since October) so we deleted it from Airtable...but haven't identified a new feed source. I believe there were some meetings on the calendar to get more info from them. @o-ram were you part of meeting with them? Otherwise it was GRAAS team. Will investigate if/when olivia responds.
I just updated SolTrans (keeping the Trillium feed as an archive b/c it is far superior w.r.t. data) and Tuolumne (which pointed to same google place...just using different URI schemes).
Last schedule one remaining is Tulare - which I'll address with overall tcag fix
RT: Update YoloBus
For OCTA: Airtable uses the OCTA domain - which is preferable to the swiftly one. I would advocate for changing it in agencies.yml
to do this also unless there is a reason we don't have it that way there?
Updated/added SJRTD URIs
Outstanding (will address when I get back to my computer later this PM)
I'm thinking the best idea around points 1-3 is to modify the URLs in airtable to match those in agencies.yml. I can try to find some time to do that.
@e-lo regarding Santa Maria, I have been trying to figure out what they are doing with GTFS. I was in a meeting with their relatively new transit manager back in Jan. about their interest in contactless payments and learned
Since then, I downloaded TripShot myself and was able to confirm that they do appear within the App. There also appears to be some mechanism for providing RT info. I haven't been able to locate a feed URL though or get one from Santa Maria.
I'm happy to reach back out to SM and ask. The person I met with was supposed to confirm the TripShot info with their IT team and get back to me anyway and never did, so I have a good reason to ask.
I'm happy to reach back out to SM and ask. The person I met with was supposed to confirm the TripShot info with their IT team and get back to me anyway and never did, so I have a good reason to ask.
That would be awesome
@evansiroky said:
I'm thinking the best idea around points 1-3 is to modify the URLs in airtable to match those in agencies.yml. I can try to find some time to do that.
I already did 3 (fix casing)
I'm meh on changing unicode to ASCII and would prefer to do the reverse - do we have to do that for some reason in our pipeline? When we or an agency advertises a feed, we would do so with an underscore _
not a %20
Re Bay Area feeds, I think I want to keep the regional feed together and aggregate the services b/c it keeps the fares intact. I don't see value in updating to do agency-specific ones in airtable?
I already did 3 (fix casing)
Cool, thanks. I'm going to get started on some more.
- I'm meh on changing unicode to ASCII and would prefer to do the reverse - do we have to do that for some reason in our pipeline? When we or an agency advertises a feed, we would do so with an underscore
_
not a%20
I also like the _
better, but am more interested in just getting this done quickly, so I'm going to update airtable to have the annoying encoded characters.
- Re Bay Area feeds, I think I want to keep the regional feed together and aggregate the services b/c it keeps the fares intact. I don't see value in updating to do agency-specific ones in airtable?
I also don't think we need to add each disaggregated service (to airtable).
I just went through the remaining URLs in agencies.yml that weren't in airtable.
Here are my responses to some of your comments:
Bay Area Ferries Schedule is same in airtable and agencies.yml - not sure why this is coming up.
This is probably happening since it occurs twice in agencies.yml and is used as a join condition. One of these feeds should probably be removed from agencies.yml.
For OCTA: Airtable uses the OCTA domain - which is preferable to the swiftly one. I would advocate for changing it in agencies.yml to do this also unless there is a reason we don't have it that way there?
It seems that the number of RT validation errors differs between their two RT feeds, so maybe they are distinct data sources. Not sure what the analysts are using.
I just updated SolTrans
Is the trip update URL in airtable correct?
Elk Grove added to agencies.yml via https://github.com/cal-itp/data-infra/pull/1224.
There are still over 20 URLs in airtable that aren't in agencies.yml. @e-lo can you take a look at these URLs? If they should not be ingested in the pipeline, it would be great to have some notes about why. Perhaps there should be some kind of flag about whether certain feeds should not be ingested in the pipeline. That might be useful for when we get around to #775.
I'm meh on changing unicode to ASCII and would prefer to do the reverse - do we have to do that for some reason in our pipeline? When we or an agency advertises a feed, we would do so with an underscore _ not a %20
I also like the _ better, but am more interested in just getting this done quickly, so I'm going to update airtable to have the annoying encoded characters.
@evansiroky is there a reason the pipeline wont accept the _
right now?
Is the trip update URL in airtable correct?
https://soltrans.connexionz.net/rtt/public/utility/gtfsrealtime.aspx/tripupdate2
Strangely, it is.
https://soltrans.connexionz.net/rtt/public/utility/gtfsrealtime.aspx/tripupdate
Also downloads data - but it is different. The one with 2
is the one posted on their website. Wondering if it is GTFS Realtime v1?
https://soltrans.connexionz.net/rtt/public/utility/gtfsrealtime.aspx/tripupdate2 seems to have more information in it compared to the other one. I went ahead and made #1231 to update.
This issue will be ongoing until #775 is resolved. The Agencies.yml vs airtable comparison notes doc will be used to track ongoing issues.
There are currently a lot of URLs that are present in agencies.yml that are not present in the airtable gtfs datasets table and also many URLs present in airtable that are not present in agencies.yml. This is detailed in the agencies.yml URLs vs airtable URIs dashboard. We should reconcile each of these URLs to make sure these data sources agree with each other.