UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
80 stars 17 forks source link

calculate transfers only between unique station pairs for #91 #92

Closed mpadge closed 1 year ago

codecov[bot] commented 1 year ago

Codecov Report

Merging #92 (815317f) into main (b145076) will increase coverage by 0.45%. The diff coverage is 100.00%.

:exclamation: Current head 815317f differs from pull request most recent head 9c3c44e. Consider uploading reports for the commit 9c3c44e to get more accurate results

@@            Coverage Diff             @@
##             main      #92      +/-   ##
==========================================
+ Coverage   82.22%   82.67%   +0.45%     
==========================================
  Files          20       21       +1     
  Lines        1783     1755      -28     
==========================================
- Hits         1466     1451      -15     
+ Misses        317      304      -13     
Impacted Files Coverage Δ
src/transfers.cpp 100.00% <100.00%> (ø)
src/transfers.h 100.00% <100.00%> (ø)
R/frequencies_to_stop_times.R 69.86% <0.00%> (-7.14%) :arrow_down:
src/csa.cpp 100.00% <0.00%> (ø)
R/route-headway.R 0.00% <0.00%> (ø)
R/router.R 76.56% <0.00%> (+0.41%) :arrow_up:
R/timetable.R 80.68% <0.00%> (+1.43%) :arrow_up:
src/freq_to_stop_times.cpp 100.00% <0.00%> (+2.85%) :arrow_up:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

jh0ker commented 1 year ago

As i said in my comment, it looks like an entirely different optimization than what i had in mind, but that's not to say it isn't worth it – especially if you observed a 90% speedup for a large feed! Though I do wonder if you can combine both optimizations?

That being said, i don't see any problems with this code. I do wonder how the impact on a large feed with few duplicated coordinates would be. In my use case, the full DELFI feed, it looks like there are still around 475,000 unique coordinates out of about 520,000 stops. I'm happy to run it for a couple hours and see.

mpadge commented 1 year ago

It shouldn't matter on such a large feed, because most of the time is the main scanning of each row of transfers anyway, and the re-indexing back out to all (unique + duplicated) stops should be negligible in that case anyway. But that sure is a heck of a lot of stops!!