Closed yakra closed 6 months ago
Here are links to charts of how the last three performance improvements played out. For FreeBSD in particular:
This finally brings FreeBSD's time under control across the board, even outperforming both Linux flavors on the same hardware.
Note the Y axis in milliseconds.
At first the new times for toma & lab5 showed up as 0. I was able to tweak the shell script that extracts the times from the siteupdate.log
s to get one more digit of precision, rather than re-run the bench mark with --timeprecision 5
.
Milliseconds again.
Interesting that toma has the worst performance, despite having the 3rd-highest clock speed. Could it be that it has only 3 MB last-level cache vice 8+ for everything else? Nope. There's one outlier of 0.0634s in its data; otherwise its average would be 0.0032s, better than all the other machines.
Total time for the entire .sql file, including the tables not affected by these changes.
This isn't new, but it should be noted that the .sql file is written in the background during the graph generation process.
This means that while n threads are dedicated to graph generation, the computer is running n+1 threads overall, at least until the DB is finished or the subgraph threads start returning, whichever happens first.
Freeing up resources used by this background task gives us more memory bandwidth for graph generation, which of course itself is also sped up by sprintf
's removal.
Compare to the results from #622, the 1st image from this post. Now there's only one bsdlab line towering over all the rest at the top of the chart, the "before" line.
Zoom in and we can see that bsdlab's graph generation time is finally out of the stratosphere, down hanging out with all the Linux machines. It even outperforms lab3 & lab4 on the same hardware. YES! A nice & relatively uncluttered chart too, with the slowest "after" lines well below the fastest "before" lines.
Same thing but with a 25s vertical scale, for easy side-by-side comparison with the zoomed-in chart from #622, the 2nd image from this post.
I'd link to the images directly, but it looks like GitHub is adding some browser session-specific tokens to the URL. If you want to manually open them in a new tab, gotta go to the post first. :(
Originally, this commit was done as a child of 3282921, but I decided to have a little fun with a merge commit. 2349b1f shows us...
sprintf
were removed first, instead of doing #622?
Again, only bsdlab's "before" line is way above the rest. The removal of sprintf
was the magic bullet we needed to get the task to scale well.
Compare the 1st image from this post.
Compare the 2nd image from this post. A much bigger impact than #622 on its own. FreeBSD is already performing better than Linux. The "before" and "after" are still fairly separated, barely touching one another at 2 threads. It'd appear that applying this change to #622 has a bigger impact than applying it to fd96ac6.
This gives a clearer picture of the benefits of using TMBitset
s under more optimal circumstances.
Especially with FreeBSD, which actually has a meaningful delta now.
Same thing but with a 25s vertical scale again, if you wanna compare side-by-wide with whatever.
Ready to rock!
Going to try it out now, thanks.
Under 30 seconds for a site update (minus DB loading) on my MBP.
I'm curious to know how long it took before.
While MacOS has a BSD lineage, that doesn't necessarily mean it suffers the same memory bottlenecks as FreeBSD.
Another factor is that performance may not just depend on the OS, but also the under-the-hood implementation of sprintf
in stdio.h
, which could well be different on the Apple platform.
Hmmm.. Production site update went from 66 seconds to 112. Graph generation specifically went from about 45 seconds to about 95.
Odd. siteupdate.log lists 6223ec7b160bcb5158b68fea38f962f37d7bd088; that's the latest one. It's not a disk cache thing; reading WPTs is finished @ 2.8s.
Could disk write access be the bottleneck?
Around the time of my last post, I noticed slow graph writes on bsdlab, which has an SSD.
Reproducing it now, I get 178.9s & 177.4s on 2 different runs.
Using -g /dev/null
, I get 1.7s.
A few years back, I noticed widely varying results when benchmarking tasks that did a lot of writing to disk. Same executable, same dataset, same number of threads. Deleting lots of files would make these tasks run faster. Maybe the time depends on what disk sectors it's writing to? Maybe it's the OS coping with a fragmented filesystem, taking longer to find where it can store files? Either way, I started writing all output files other that siteupdate.log itself to /dev/null when doing benchmarks, to get consistent & reproducible data (and to show what the software is capable of when disk access isn't a bottleneck :grin:).
-g /dev/null
?The performance-killing
sprintf
function has been removed from user log, DB file, and graph generation routines. More to come in the future; I've yet to work on NMPs and stats CSVs.
I see that you've removed the function but don't get how you log now. Can you point me to the "new" function (call)?
@yakra + @jteresco Something went wrong. Both production servers indicate 0 miles for all users now
https://forum.travelmapping.net/index.php?topic=6238.msg34610#msg34610 https://travelmapping.net/stat.php https://tmrail.teresco.org/stat.php
*puts fingers to keyboard to respond to previous post*
...Crap.
How the... It's pulling this info from the DB. And diffing the "before" and "after" versions showed no changes to the tables...
Anyway.
I see that you've removed the function but don't get how you log now. Can you point me to the "new" function (call)?
As before, I'm using the overloaded std::ostream::operator <<.
(Overloaded operators are basically just functions with syntactic sugar on top.)
What's different now is how many times it's used and what gets inserted into the ofstream
.
The changes in the code are too numerous to mention, but here's one example from userlogs:
Before:
A formatted string was prepared, "Traveled %i of %i (%.1f%%), Clinched %i of %i (%.1f%%) preview systems",
where the bits starting with %
were replaced by the 6 numbers on the following 2 lines.
This string was then inserted into the output file stream.
After: The components that made up that string are instead individually inserted into the ofstream... Text << number << text << number << text << number << text << number << text << number << text << number << text.
yakra@BiggaTomato:~/ytm/DataProcessing/siteupdate/cplusplus$ diff \
<(tail -n +12258954 _3282/TravelMapping.sql | head -n 3) \
<(tail -n +12258954 _spr1/TravelMapping.sql | head -n 3) \
# overallMileageByRegion table
1,3c1,3
< ('AB','9235.1470421475879','18472.872821377736')
< ,('ABW','45.461910136369539','45.461910136369539')
< ,('AFG','2493.3650900241528','2493.3650900241528')
---
> ('AB','0x1.20992d246ef83p+13','0x1.20a37dc4e3228p+14')
> ,('ABW','0x1.6bb1fdf10b2f4p+5','0x1.6bb1fdf10b2f4p+5')
> ,('AFG','0x1.37abaed1463aap+11','0x1.37abaed1463aap+11')
I use a custom program to diff the .SQL files, which apparently reads & interprets those hex values just fine, and converts them to a number internally. MySQL on the other hand apparently doesn't like them so much.
Let's git checkout
52ef723 and use the old version on noreaster till I can get this sorted.
Probably wanna nix floatfield
. Me slwwp now. Sleep. Me slwwp... *AHEM* sleep now.
@yakra no urgency on this since it was easy enough to jump back to a previous point.
I'll try some things out with the file writes some time soon, don't hesitate to remind me if I don't get at it soon.
I know exactly what to do. Just going a little overboard on precision & due diligence. Also typing up a post-mortem in #624 on the unlikely series of very specific events that got us here.
I'll have a pull request later today.
Closes #518!
The performance-killing
sprintf
function has been removed from user log, DB file, and graph generation routines. More to come in the future; I've yet to work on NMPs and stats CSVs.@jteresco, I recommend a compile & test drive of this, if for no other reason than to see how blazingly fast graph generation is now. :grinning:
I am curious to see though, how fast
--errorcheck
is with the userlog speedups.One minor change: the
waypoints
table in the DB has done away with what I call "Python style floats", where a trailing.0
is written after an integral value.42.0
->42
. This could in theory happen in other tables too; it's just inwaypoints
where we see it in practice. I don't expect it'll cause any problems.We're nearing the end of the road for big graph generation improvements. Percentagewise, this beats any of the first big five pull requests from 2019. And it wasn't even something I had planned. While I still have a few ideas left, there probably won't be anything else of this magnitude. Just smaller incremental changes, much lower impact. I do expect to get lab2 to more consistently break the 1-second barrier, until the ever-increasing size of the HighwayData repo takes that possibility (using the most recent commit) away permanently.