UDST / urbanaccess

A tool for GTFS transit and OSM pedestrian network accessibility analysis by UrbanSim
https://udst.github.io/urbanaccess/index.html
GNU Affero General Public License v3.0
236 stars 56 forks source link

Unnecessary merge in _stops_agencyid()? #26

Closed kuanb closed 7 years ago

kuanb commented 7 years ago

In _stops_agencyid, a merge is made with one dataframe and another that is only one column wide. Since that column already exists in the left merged dataframe output, this step should not happen (or, it is missing another 2nd column, that should be in there!).

Happy to submit a PR to fix but wanted to check first as to what the intent was with this merge.

Location: https://github.com/UDST/urbanaccess/blob/73fcab917a3c044a5360836237910fcdd81da05e/urbanaccess/gtfs/utils_format.py#L300

This happens in a few of these helper functions, as well, such as: https://github.com/UDST/urbanaccess/blob/73fcab917a3c044a5360836237910fcdd81da05e/urbanaccess/gtfs/utils_format.py#L241 in _calendar_agencyid

Which makes me wonder if the intent is to account for/capture column values where, after the left join, there are nulls? If so, I assume that is the purpose of the null check and update here: https://github.com/UDST/urbanaccess/blob/73fcab917a3c044a5360836237910fcdd81da05e/urbanaccess/gtfs/utils_format.py#L478

sablanchard commented 7 years ago

Thank you Kuan, good question. You are correct on your second assumption. The extra merge is intended to account for stop_ids that after the previous merges drop off due to edge cases in GTFS feeds where some stops are not uniformly utilized through the entire series of feed files. This way we account for all stops regardless of their utilization across GTFS files. Nonetheless it is useful for a user to keep those stops for accounting purposes.

I will close this issue, but happy to reopen if there is an issue.

kuanb commented 7 years ago

Thanks for the clarification :)