yakra commented 6 months ago

Closes #518!

The performance-killing sprintf function has been removed from user log, DB file, and graph generation routines. More to come in the future; I've yet to work on NMPs and stats CSVs.

@jteresco, I recommend a compile & test drive of this, if for no other reason than to see how blazingly fast graph generation is now. :grinning:

I am curious to see though, how fast --errorcheck is with the userlog speedups.

One minor change: the waypoints table in the DB has done away with what I call "Python style floats", where a trailing .0 is written after an integral value. 42.0 -> 42. This could in theory happen in other tables too; it's just in waypoints where we see it in practice. I don't expect it'll cause any problems.

We're nearing the end of the road for big graph generation improvements. Percentagewise, this beats any of the first big five pull requests from 2019. And it wasn't even something I had planned. While I still have a few ideas left, there probably won't be anything else of this magnitude. Just smaller incremental changes, much lower impact. I do expect to get lab2 to more consistently break the 1-second barrier, until the ever-increasing size of the HighwayData repo takes that possibility (using the most recent commit) away permanently.

yakra commented 6 months ago

userlogs

Here are links to charts of how the last three performance improvements played out. For FreeBSD in particular:

526 helped out quite a bit at the high end, though a considerable bottleneck still remained.

592 brought the low end way down, only to run right back into the same bottleneck.

607 didn't have much effect.

User This finally brings FreeBSD's time under control across the board, even outperforming both Linux flavors on the same hardware.

yakra commented 6 months ago

database

routes connectedRoutes waypoints overallMileageByRegion Note the Y axis in milliseconds. At first the new times for toma & lab5 showed up as 0. I was able to tweak the shell script that extracts the times from the siteupdate.logs to get one more digit of precision, rather than re-run the bench mark with --timeprecision 5.

systemMileageByRegion Milliseconds again.

clinchedOverallMileageByRegion Interesting that toma has the worst performance, despite having the 3rd-highest clock speed. Could it be that it has only 3 MB last-level cache vice 8+ for everything else? Nope. There's one outlier of 0.0634s in its data; otherwise its average would be 0.0032s, better than all the other machines.

clinchedSystemMileageByRegion clinchedConnectedRoutes clinchedRoutes Total time for the entire .sql file, including the tables not affected by these changes.

yakra commented 6 months ago

graph generation

This isn't new, but it should be noted that the .sql file is written in the background during the graph generation process. This means that while n threads are dedicated to graph generation, the computer is running n+1 threads overall, at least until the DB is finished or the subgraph threads start returning, whichever happens first. Freeing up resources used by this background task gives us more memory bandwidth for graph generation, which of course itself is also sped up by sprintf's removal.

Graphs_3282_110 Compare to the results from #622, the 1st image from this post. Now there's only one bsdlab line towering over all the rest at the top of the chart, the "before" line.

Graphs_3282_19 Zoom in and we can see that bsdlab's graph generation time is finally out of the stratosphere, down hanging out with all the Linux machines. It even outperforms lab3 & lab4 on the same hardware. YES! A nice & relatively uncluttered chart too, with the slowest "after" lines well below the fastest "before" lines.

Graphs_3282_25 Same thing but with a 25s vertical scale, for easy side-by-side comparison with the zoomed-in chart from #622, the 2nd image from this post.

I'd link to the images directly, but it looks like GitHub is adding some browser session-specific tokens to the URL. If you want to manually open them in a new tab, gotta go to the post first. :(

yakra commented 6 months ago

alternate universe

Originally, this commit was done as a child of 3282921, but I decided to have a little fun with a merge commit. 2349b1f shows us...

What if `sprintf` were removed first, instead of doing #622?

Graphs_2349-1_110 Again, only bsdlab's "before" line is way above the rest. The removal of sprintf was the magic bullet we needed to get the task to scale well. Compare the 1st image from this post.

Graphs_2349-1_25 Compare the 2nd image from this post. A much bigger impact than #622 on its own. FreeBSD is already performing better than Linux. The "before" and "after" are still fairly separated, barely touching one another at 2 threads. It'd appear that applying this change to #622 has a bigger impact than applying it to fd96ac6.

And if #622 had been applied after this?

Graphs_2349-2_13 This gives a clearer picture of the benefits of using TMBitsets under more optimal circumstances. Especially with FreeBSD, which actually has a meaningful delta now.

Graphs_2349-2_25 Same thing but with a 25s vertical scale again, if you wanna compare side-by-wide with whatever.

yakra commented 6 months ago

Ready to rock!

jteresco commented 6 months ago

Going to try it out now, thanks.

jteresco commented 6 months ago

Under 30 seconds for a site update (minus DB loading) on my MBP.

yakra commented 6 months ago

I'm curious to know how long it took before. While MacOS has a BSD lineage, that doesn't necessarily mean it suffers the same memory bottlenecks as FreeBSD. Another factor is that performance may not just depend on the OS, but also the under-the-hood implementation of sprintf in stdio.h, which could well be different on the Apple platform.

jteresco commented 6 months ago

Hmmm.. Production site update went from 66 seconds to 112. Graph generation specifically went from about 45 seconds to about 95.

yakra commented 6 months ago

Odd. siteupdate.log lists 6223ec7b160bcb5158b68fea38f962f37d7bd088; that's the latest one. It's not a disk cache thing; reading WPTs is finished @ 2.8s.

Could disk write access be the bottleneck?

Around the time of my last post, I noticed slow graph writes on bsdlab, which has an SSD. Reproducing it now, I get 178.9s & 177.4s on 2 different runs. Using -g /dev/null, I get 1.7s.

A few years back, I noticed widely varying results when benchmarking tasks that did a lot of writing to disk. Same executable, same dataset, same number of threads. Deleting lots of files would make these tasks run faster. Maybe the time depends on what disk sectors it's writing to? Maybe it's the OS coping with a fragmented filesystem, taking longer to find where it can store files? Either way, I started writing all output files other that siteupdate.log itself to /dev/null when doing benchmarks, to get consistent & reproducible data (and to show what the software is capable of when disk access isn't a bottleneck :grin:).

Are graph files written to SSD, or magnetic disk?
What are your results with -g /dev/null?

michihdeu commented 6 months ago

The performance-killing sprintf function has been removed from user log, DB file, and graph generation routines. More to come in the future; I've yet to work on NMPs and stats CSVs.

I see that you've removed the function but don't get how you log now. Can you point me to the "new" function (call)?

michihdeu commented 6 months ago

@yakra + @jteresco Something went wrong. Both production servers indicate 0 miles for all users now

https://forum.travelmapping.net/index.php?topic=6238.msg34610#msg34610 https://travelmapping.net/stat.php https://tmrail.teresco.org/stat.php

yakra commented 6 months ago

*puts fingers to keyboard to respond to previous post*

...Crap.

How the... It's pulling this info from the DB. And diffing the "before" and "after" versions showed no changes to the tables...

yakra commented 6 months ago

Anyway.

I see that you've removed the function but don't get how you log now. Can you point me to the "new" function (call)?

As before, I'm using the overloaded std::ostream::operator <<. (Overloaded operators are basically just functions with syntactic sugar on top.) What's different now is how many times it's used and what gets inserted into the ofstream.

The changes in the code are too numerous to mention, but here's one example from userlogs:

Before: A formatted string was prepared, "Traveled %i of %i (%.1f%%), Clinched %i of %i (%.1f%%) preview systems", where the bits starting with % were replaced by the 6 numbers on the following 2 lines. This string was then inserted into the output file stream.

After: The components that made up that string are instead individually inserted into the ofstream... Text << number << text << number << text << number << text << number << text << number << text << number << text.

yakra commented 6 months ago

yakra@BiggaTomato:~/ytm/DataProcessing/siteupdate/cplusplus$ diff \
<(tail -n +12258954 _3282/TravelMapping.sql | head -n 3) \
<(tail -n +12258954 _spr1/TravelMapping.sql | head -n 3) \
# overallMileageByRegion table
1,3c1,3
< ('AB','9235.1470421475879','18472.872821377736')
< ,('ABW','45.461910136369539','45.461910136369539')
< ,('AFG','2493.3650900241528','2493.3650900241528')
---
> ('AB','0x1.20992d246ef83p+13','0x1.20a37dc4e3228p+14')
> ,('ABW','0x1.6bb1fdf10b2f4p+5','0x1.6bb1fdf10b2f4p+5')
> ,('AFG','0x1.37abaed1463aap+11','0x1.37abaed1463aap+11')

I use a custom program to diff the .SQL files, which apparently reads & interprets those hex values just fine, and converts them to a number internally. MySQL on the other hand apparently doesn't like them so much.

Let's git checkout 52ef723 and use the old version on noreaster till I can get this sorted.

yakra commented 6 months ago

Probably wanna nix floatfield. Me slwwp now. Sleep. Me slwwp... *AHEM* sleep now.

https://github.com/TravelMapping/DataProcessing/pull/623/files#diff-a9628ca98720ba21a19933f4b3b1059ed1d7c674a4b83988dce9bb160e448d7aR25

https://github.com/TravelMapping/DataProcessing/pull/623/files#diff-a9628ca98720ba21a19933f4b3b1059ed1d7c674a4b83988dce9bb160e448d7aR194

https://github.com/TravelMapping/DataProcessing/pull/623/files#diff-a9628ca98720ba21a19933f4b3b1059ed1d7c674a4b83988dce9bb160e448d7aR223

jteresco commented 6 months ago

@yakra no urgency on this since it was easy enough to jump back to a previous point.

I'll try some things out with the file writes some time soon, don't hesitate to remind me if I don't get at it soon.

yakra commented 6 months ago

I know exactly what to do. Just going a little overboard on precision & due diligence. Also typing up a post-mortem in #624 on the unlikely series of very specific events that got us here.

I'll have a pull request later today.

TravelMapping / DataProcessing

#585 phase 1 #623

userlogs

526 helped out quite a bit at the high end, though a considerable bottleneck still remained.

592 brought the low end way down, only to run right back into the same bottleneck.

607 didn't have much effect.

database

graph generation

alternate universe

What if `sprintf` were removed first, instead of doing #622?

And if #622 had been applied after this?

TravelMapping / DataProcessing

#585 phase 1 #623

userlogs

526 helped out quite a bit at the high end, though a considerable bottleneck still remained.

592 brought the low end way down, only to run right back into the same bottleneck.

607 didn't have much effect.

database

graph generation

alternate universe

What if sprintf were removed first, instead of doing #622?

And if #622 had been applied after this?

What if `sprintf` were removed first, instead of doing #622?