TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

Overlap DB population and graph generation? #567

Open jteresco opened 1 year ago

jteresco commented 1 year ago

A more challenging but potentially substantial efficiency improvement in the site update process could be to start loading the primary DB with the new .sql file while graphs are being generated and written. This is not as much of an improvement as in the past given the speed of graph generation, so might not be worth the trouble. I'm opening this as a place to record some thoughts about how this might (or might not) be able to be accomplished, and whether it's worth the added complexity.

yakra commented 1 year ago

Interesting. The tables that can already be produced up to that point (those in the sqlfile1 function) are themselves written in the background during graph generation. Depending on the system siteupdate is running on & number of threads, that can finish up at wildly different times during the process. Here's what's in the latest siteupdate.log on noreaster:

[yakra@noreaster /home/www/tm/logs]$ grep -i -C 2 pause siteupdate.log 
[57.1] Writing area graphs.
siena(2) (8,6) (8,6) (8,6) siena(2.5) (16,17) (16,17) (16,17) siena(3) (28,29) (28,29) (28,29) siena(4) (58,65) (58,65) (58,65) siena(5) (91,107) (89,105) (89,105) siena(10) (212,256) (195,239) (195,239) siena(25) (577,668) (508,600) (508,600) siena(50) (1463,1683) (1237,1457) (1237,1457) nyc(20) (988,1080) (956,1048) (956,1048) siena(100) (5565,6497) (4732,5664) (4733,5665) boston(20) (525,643) (508,626) (508,626) dc(20) (761,953) (721,913) (721,913) naples(25) (39,37) (33,31) (33,31) albuquerque(50) (356,408) (297,349) (297,349) wolfcreek(50) (166,167) (131,132) (131,132) montreal(25) (564,644) (504,584) (504,584) seattle(25) (461,503) (402,444) (402,444) innsbruck(25) (530,575) (451,496) (451,496) copenhagen(25) (512,597) (465,550) (465,550) london(25) (2371,3527) (2269,3425) (2269,3425) ubuffalo(50) (968,1143) (832,1007) (832,1007) kc(20) (485,544) (473,532) (473,532) grandisland(50) (454,495) (436,477) (436,477) omaha(30) (377,409) (355,387) (355,387) stlouis(25) (632,741) (557,667) (557,667) chicago(25) (531,663) (494,626) (494,626) sfbay(50) (918,981) (867,930) (867,930) muhlenberg(25) (621,668) (554,601) (554,601) austin(25) (286,308) (276,298) (276,298) baltimore(25) (792,991) (715,914) (715,914) mhc(25) (449,518) (376,446) (376,446) la(125) (2956,3190) (2642,2876) (2643,2877) umtc(25) (713,793) (694,774) (694,774) rmu(25) (975,1057) (855,939) (856,940) dfw(50) (1569,1779) (1526,1736) (1526,1736) houston(35) (762,837) (751,826) (751,826) sanantonio(20) (356,395) (355,394) (355,394) syracuse(25) (424,491) (357,424) (357,424) rpi(2) (20,18) (20,18) (20,18) rpi(5) (76,90) (73,87) (73,87) union(2) (12,9) (12,9) (12,9) union(5) (47,51) (46,50) (46,50) amsterdam(2) (5,4) (5,4) (5,4) amsterdam(5) (25,23) (24,22) (24,22) amsterdam(10) (78,82) (70,75) (70,75) desales(25) (653,717) (603,667) (603,667) westfield(5) (18,15) (15,13) (15,13) westfield(10) (90,96) (73,79) (73,79) westfield(25) (508,606) (441,539) (441,539) 
[60.1] Pause writing database file TravelMapping-2023-01-14@22:20:11.sql.
!
[63.6] Clearing HighwayGraph contents from memory.
yakra commented 1 year ago

There are still a few more calls to ErrorList::add_error after the call to sqlfile1, but not many. area.cpp#L17-L37 multisystem.cpp#L17-L26 multiregion.cpp#L17-L26 fullcustom.cpp#L17-L64 If populating the DB before all ErrorList errors are found, it could get into a bad state. Would have to move sqlfile1 to after GraphListEntry::entries is populated.