TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

waypoint simplification sorting #285

Closed yakra closed 3 years ago

yakra commented 4 years ago

Thinking about multi-threading the process of Creating unique names and vertices...

yakra commented 3 years ago

Multi-threaded vertex construction is implemented, but performance is a bit underwhelming; it doesn't scale super well. Efficiency peaks at 4 threads, then slows down. Pretty heavy use of mutexes here...

yakra commented 3 years ago

dead branches on BiggaTomato

branch last commit msg comments
insertion 9c297e8 cwn2 cherrypick into main branch
manually recreate
cherrypick into stash
HGVertex
(older)
62765ec set/mutex arrays with original graph point lists instead of vectors
cherrypick into main branch
cwn2Asc e680bdd cwn2Asc straightforward_concurrency component
cwn2Aen 4cbf3c9 cwn2Aen exit_number component
HGVertex_scrap 0e43c6d minor CWN speedups cherrypick -> 12a8ea8
HGVertex_old 9f4794f Python 3+ intersection bugfix
yakra commented 3 years ago
yakra commented 3 years ago
final single-threaded revisions   BT lab1 lab1.5 lab2 lab3
_master
b155944c33c1b5b4470bf1b1b16a42a54fff79ea
5.477 3.277 3.16 4.001 5.68
_insert 5.255 3.166 3.115 3.893 5.664
_gp 5.349 3.125 3.071 3.829 5.659
_func 5.287 3.174 3.065 3.833 5.588
_cwn 5.467 3.123 3.054 3.852 5.465

Nope.

Wrong regex. Used 'unique|Creating edges'; should have been 'Setting|Creating edges'

yakra commented 3 years ago

final single-threaded revisions

execs='_master _insert _gp _func _cwn';
echo -en '\n|| ';
echo $execs | sed 's~ ~ | ~g';
for e in foo $execs; do
 echo -n '| -- ';
done;
echo '|';
for box in BT lab1 lab5 lab2 lab3; do
  echo -en "| $box | ";
  ./table.sh $box sulogs/2021-02-17 'Setting|Creating edges' $execs
  | tail -n 1 | sed "s~$t~ | ~g";
done 2>/dev/null
_master _insert _gp _func _cwn
BT 7.7440 7.5280 6.1650 6.0950 6.2740
lab1 4.4360 4.3330 3.6100 3.6490 3.6080
lab5 4.7700 4.7250 3.6560 3.6430 3.6350
lab2 5.3490 5.2440 4.4150 4.4080 4.4310
lab3 8.0290 8.1860 6.9630 6.7590 6.6380
yakra commented 3 years ago
squashed on BiggaTomato exec
suffix
commit msg comments
_insert 5b2d691 vertex_names membership & insertions • Avoid redundant membership checks + insertion.
• Always attempt to insert. Save result; branch on whether successful.
• Nix the good_to_go boolean.
_gp e5bf367 simplification priority Nix WaypointQuadtree::graph_points, waypoint_simplification_sort, waypoint_simplification_priority, and make separate unsorted lists of hi-priority & lo-priority points per TravelMapping#285. Build more efficiently: Skip devel systems entirely; remove redundant colcation checks. Python too.
_func 5e54c6c separate simplify function Refactor the actual loop that generates unique names and vertices, the one that got touched up 2 commits ago, into a separate function.
_cwn a765338 CWN.new • canonical_waypoint_name receives HighwayGraph* (C++)
• remove extraneous colocation check
• exit_number: remove extraneous list_entry_name() check, as name_no_abbrev() must match
• reversed_border_labels: remove redundant variable
• avoid unnecessary lookup of new straightforward_intersection name
Python too.
_mtx b11f25a mutexes Mutexes for vertex_names and waypoint_naming_log.
_0l de3f7e3 threaded vertex construction First attempt, with graph points interleaved across arrays of lists, similar to threaded WaypointQuadtree::sort.
_0v a153e68 graph point vectors Reduce mutex collisions for per-region/system vertex sets by storing graph points in 2 (hi/lo priority) vectors, and giving each thread a solid nth of the pie to munch on.
_av 6db7e64 set/mutex arrays Reduce mutex collisions for the vertex_names set with arrays of 256 sets & 256 mutexes indexed by the vertex name's final char.
_wvp 439604f create w:v hash after threads finish Reduce mutex collisions for the vertices hash table by storing pair<Waypoint*,HGVertex*> lists for each thread, and building hash table afterwards.
_cwn2Aw 6399f49 cwn2Aw strncmp instead of substr in label_references_route
_cwn2ABC 96920d9 cwn2ABC Replace substr calls in exit_number & straightforward_concurrency.
_cb5be2f cb5be2f progress indicator fix Before: *counter read/written by all threads with no mutex, resulting in too many dots.
After: read/write only in thr[0] -- n times as frequently for 1/n of the data.
_vmtP 7937d3f penultimate delete[] wvp;
terse Py array indexing
_vmtU accdcf6 points->at(w) -> (*points)[w] no need to check for exceptions
ceaf43a restore list_entry_name() check to CWN/exit_number https://github.com/TravelMapping/DataProcessing/issues/285#issuecomment-787122430
_en 1ff7240 exit_number CWN check for name_no_abbrev(##) https://github.com/TravelMapping/DataProcessing/issues/285#issuecomment-787122430
_hgvR 12a8ea8 minor CWN speedups cherrypick 0e43c6d3836b3df2a057271c5eae40ca0f9ce511
9f4794f Python 3+ intersection bugfix
yakra commented 3 years ago
yakra commented 3 years ago

Pending changes will alter the order waypoints are fed into canonical_waypoint_name(), resulting in diffs such as:

diff <(sort ST/logs/waypointsimplification.log) <(sort logs/waypointsimplification.log) | head -n 8

12639c12639
< Appended region: A10@2|PRT
---
> Appended region: A10@2|ESP-NC
12641c12641
< Appended region: A10@5|PRT
---
> Appended region: A10@5|ESP-NC

diff <(sed -r 's~(Appended region: .+)\|.+~\1~' ST/logs/waypointsimplification.log | sort) <(sed -r 's~(Appended region: .+)\|.+~\1~' logs/waypointsimplification.log | sort)

176144c176144
< Straightforward_intersection: A1Lon@B100&B100@A1 -> A1Lon/B100 (A1/B100 already taken)
---
> Straightforward_intersection: A1Lon@B100&B100@A1 -> A1/B100
176189c176189
< Straightforward_intersection: A1War@B100&B100@A1 -> A1/B100
---
> Straightforward_intersection: A1War@B100&B100@A1 -> A1War/B100 (A1/B100 already taken)

Certainly not a big deal in & of itself. Going forward, vertex names will be the same each time with siteupdateST & siteupdate.py. So far so good.

Multithreaded, points aren't processed in a deterministic order. There will be diffs such as these between successive runs. A new option of siteupdate -v or siteupdate --st-vertices forces single-threaded waypoint simplification, resulting in the same vertex names each time. Useful for development.

@jteresco, what's more useful to you & your students? If preferred, I can easily make single-threaded waypoint simplification the default behavior, and change the --st-vertices option to --mt-vertices. The time difference is only about 1.7 s on BiggaTomato, lol. Sure to be even smaller on faster machines.

jteresco commented 3 years ago

Catching up on some things. I think it's not that urgent to maintain consistency when one waypoint label gets simplified and the other cannot, but it seems like it's easy enough to do and probably should be done to avoid potential confusion.

yakra commented 3 years ago

it seems like it's easy enough to do and probably should be done to avoid potential confusion.

OK. I'll make single-threaded simplification the default, with -v or --mt-vertices optional.

yakra commented 3 years ago

I became too obsessed with achieving a squeaky-clean commit history, wound up with 4 different branches taking slightly different development paths toward the same final product, then got a bit burned out & distracted by other things for almost 2 months.

Time to remember where I was in the process, clean up, and pick up where I left off. I hope to strike a balance between... • a commit history that's easy to follow • a commit history that's clean (a little overlap though not the same) • maintaining my sanity

yakra commented 3 years ago

@jteresco wrote:

I think it's not that urgent to maintain consistency when one waypoint label gets simplified and the other cannot, but it seems like it's easy enough to do and probably should be done to avoid potential confusion.

@yakra wrote:

OK. I'll make single-threaded simplification the default, with -v or --mt-vertices optional.

This is also the most convenient for me from a development standpoint. Multi-threaded waypointsimplification.log files aren't DIFFable without some heavy preprocessing. Single-threaded files can be compared more easily.

To filter out the diffs caused by multithreading:

diff \
<(sed -r \
  -e 's~\|.*~~' \
  -e 's~.*: ~~' \
  -e 's~^(.+)([A-Za-z]{0,3})@(.+)([A-Za-z]{0,3})(_?[A-Za-z]{0,4})&\3([A-Za-z]{0,3})@\1\2(_?[A-Za-z]{0,4}) -> \1\2\7/\3\6\5 \(\1\7/\3\5 already taken\)$~\1\2@\3\4\5\&\3\6@\1\2\7 -> \1\7/\3\5~' \
  -e 's~^(.+)([A-Za-z]{0,3})@(.+)([A-Za-z]{0,3})(_?[A-Za-z]{0,4})&\3([A-Za-z]{0,3})@\1(_?[A-Za-z]{0,4}) -> \1\2\7/\3\6\5 \(\1\7/\3\5 already taken\)$~\1\2@\3\4\5\&\3\6@\1\7 -> \1\7/\3\5~' \
    ST/logs/waypointsimplification.log \
  | sort) \
<(sed -r \
  -e 's~\|.*~~' \
  -e 's~.*: ~~' \
  -e 's~^(.+)([A-Za-z]{0,3})@(.+)([A-Za-z]{0,3})(_?[A-Za-z]{0,4})&\3([A-Za-z]{0,3})@\1\2(_?[A-Za-z]{0,4}) -> \1\2\7/\3\6\5 \(\1\7/\3\5 already taken\)$~\1\2@\3\4\5\&\3\6@\1\2\7 -> \1\7/\3\5~' \
  -e 's~^(.+)([A-Za-z]{0,3})@(.+)([A-Za-z]{0,3})(_?[A-Za-z]{0,4})&\3([A-Za-z]{0,3})@\1(_?[A-Za-z]{0,4}) -> \1\2\7/\3\6\5 \(\1\7/\3\5 already taken\)$~\1\2@\3\4\5\&\3\6@\1\7 -> \1\7/\3\5~' \
    logs/waypointsimplification.log \
  | sort)
yakra commented 3 years ago

Mutexes and Scaling

Vert This is another one of those cases where too big of a mutex footprint kills performance.

yakra commented 3 years ago

Two more possibilities for reducing the impact of mutexes:

How much impact from...

yakra commented 3 years ago

^ Re the 2nd bullet point, ignoring the 1st: Another CPU vs RAM trade-off, like what we saw in #404 only much less pronounced. Alternatives that perform better with fewer threads perform worse with more, and vice versa. Vert_set 5d22a4 (the darker solid lines) has some added cost in creating temporary containers and then iterating through them. This is more than made up for by not having to lock and unlock mutexes all the time, which comes with its own computational costs even when there's nothing to lock, no risk of collisions. Hence the decent speedup even at 1 thread. However, building these lists is more memory intensive. This is fine with relatively few threads needing bandwidth, but once we add in a load of threads and start saturating the memory bus, performance starts to lag a wee bit.

To implement, or not?