Closed yakra closed 1 month ago
Fix is implemented. 3 new lines of code, not counting a comment and one that's just a }
.
Looks like these cases are quite common in RailwayData; 47 of them. Still checking my work.
#diff size for each file not including final TOTAL row
files=$( \
for f in $(tail -n +3 d4a3_conc.l0g | head -n 34 \
| sed -r -e "s~$e\[[0-9]+m~~" -e 's~.*_d4a3/stats/(.+) and .*~\1~'); do
printf "%02i $f\n" $(diff <(head -n -1 _d4a3/stats/$f) <(head -n -1 _conc/stats/$f) | wc -l);
done)
#get deltas from the last lines of those without diffs, formatted to paste into the smbr spreadsheet tab
for f in $(echo "$files" | grep ^00 | cut -f2 -d' '); do
# get column numbers
cols=$(diff <(tail -n 1 _d4a3/stats/$f | sed -e 's~,~\n\n~g' -e 's~TOTAL~\n0~') \
<(tail -n 1 _conc/stats/$f | sed -e 's~,~\n\n~g' -e 's~TOTAL~\n0~') \
| egrep '([0-9]+)c\1' | sed -r 's~([0-9]+)c\1~\1/2~' | bc | tail -n +2) # skip trav total column
for col in $cols; do
rg=`head -n 1 _d4a3/stats/$f | cut -f$col -d,`
d=`paste -d- <(tail -n 1 _d4a3/stats/$f | cut -f$col -d,) \
<(tail -n 1 _conc/stats/$f | cut -f$col -d,) | bc`
echo $f | sed "s~-all.csv~\t$rg\t$d~"
done
done
#do those with traveler diffs
for f in $(echo "$files" | grep -v ^00 | cut -f2 -d' '); do
#get row numbers
rows=$(diff _d4a3/stats/$f _conc/stats/$f \
| egrep '([0-9]+,[0-9]+)c\1|([0-9]+)c\2' \
| sed -e 's~c.*~~' -e 's~,~ ~')
for row in $rows; do
l1=`tail -n +$row _d4a3/stats/$f | head -n 1`
l2=`tail -n +$row _conc/stats/$f | head -n 1`
t=`echo $l1 | cut -f1 -d,`
# get column numbers
cols=$(diff <(echo; echo $l1 | sed -e 's~,~\n\n~g') \
<(echo; echo $l2 | sed -e 's~,~\n\n~g') \
| egrep '([0-9]+)c\1' | sed -r 's~([0-9]+)c\1~\1/2~' | bc)
for col in $cols; do
rg=`head -n 1 _d4a3/stats/$f | cut -f$col -d,`
d=`paste -d- <(echo $l1 | cut -f$col -d,) <(echo $l2 | cut -f$col -d,) | bc`
echo $f | sed "s~[-.].*~\t$rg\t$t\t$d~"
done
done
done
grep & compare...
| grep Total.*TOTAL
per-system & site-wide totals all match routedatastats.log#set $table to the result of that last big shell script block, and then...
users=`echo "$table" | cut -f3 | sort | uniq`
for u in $users; do clear; echo "$table" | grep $u; pluma _{d4a3,conc}/logs/users/$u.log; read OK; done
# ...until getting to "TOTAL". This all matches userlogs. Then...
echo "$table" | grep -v Total | grep $TOTAL
routedatastats.log Breakdown by region
routedatastats.log Breakdown by system
#find new concurrencies in concurrencies.log & & search for augments based on them
cl=$( \
fgrep -f \
<(diff <(grep -v augment _d4a3/logs/concurrencies.log) \
<(grep -v augment _conc/logs/concurrencies.log) \
| grep '^>' | sed -r 's~^> New concurrency (\[.+\])(\[.+\]) \(2\)$~\1 based on \2\n\2 based on \1~') \
_conc/logs/concurrencies.log \
| sed -r 's~Concurrency augment for traveler (.+) based on .+~\1~')
#convert DB lines to the same style
db=$( \
for e in $(diff <(tail -n +354277 _d4a3/TravelMapping.sql | head -n 212997 | sed s/^,// | sort) \
<(tail -n +354277 _conc/TravelMapping.sql | head -n 213005 | sed s/^,// | sort) \
| grep '^>' | cut -f2 -d" "); do
t=`echo "$e" | cut -f4 -d"'"`
id=`echo "$e" | cut -f2 -d"'"`
s_line=`tail -n +186429 _d4a3/TravelMapping.sql | head -n 167848 | egrep "^,?\('$id'"`
r=`echo "$s_line" | cut -f8 -d"'"`
r=`tail -n +1494 _d4a3/TravelMapping.sql | head -n 5695 | grep "$r'"`
r=`(echo "$r" | cut -f4 -d"'"; echo "$r" | cut -f6 -d"'") | tr '\n' ' '`
i=`echo "$s_line" | cut -f4 -d"'"`
w_pair=`tail -n +12887 _d4a3/TravelMapping.sql | head -n 173542 | grep -A 1 "^,\?('$i'" | tr -d '\n'`
w=`echo "$w_pair" | cut -f4 -d"'"`
p=`echo "$w_pair" | cut -f14 -d"'"`
echo "$t: [$r$w $p]"
done)
#now sort & diff them
diff <(echo "$cl" | sort) <(echo "$db" | sort)
graphs=$(diff -qr _d4a3/graphs/ _conc/graphs/ | cut -f3 -d/ | sed -e 's~\.tmg.*~~' -e 's~-simple$\|-traveled$~~' | uniq)
allgraphs=$(diff -qr _d4a3/graphs/ _conc/graphs/ | cut -f3 -d/ | sed -e 's~\.tmg.*~~')
for g in $graphs; do
diff <(head -n 2 _d4a3/graphs/$g-simple.tmg | tail -n 1 | cut -f1 -d' ') \
<(head -n 2 _conc/graphs/$g-simple.tmg | tail -n 1 | cut -f1 -d' ')
done
for g in $graphs; do
printf "%4i $g\n" \
$(paste -d- <(head -n 2 _d4a3/graphs/$g-simple.tmg | tail -n 1 | cut -f2 -d' ') \
<(head -n 2 _conc/graphs/$g-simple.tmg | tail -n 1 | cut -f2 -d' ') \
| bc)
done | grep -e -region
| grep -e -country
instead| grep -e -continent
insteadVT EA Rut
Number of vertices differ in some collapsed graphs. Makes sense though.
The difference is in whether or not a vertex's incident edges are collapsed around it, causing it to go from a bona fide visible vertex to one of the intermediate_points
along a collapsed edge.
e=`echo -en '\e'`
for g in $graphs; do
oldv=`head -n 2 _d4a3/graphs/$g.tmg | tail -n 1 | cut -f1 -d' '`
newv=`head -n 2 _conc/graphs/$g.tmg | tail -n 1 | cut -f1 -d' '`
d=`diff <(tail -n +3 _d4a3/graphs/$g.tmg | head -n $oldv) \
<(tail -n +3 _conc/graphs/$g.tmg | head -n $newv) | grep '^[<>]'`
if [ `echo "$d" | wc -w` -gt 0 ]; then echo -e "$g\n$d"; fi
done \
| sed -e "s~^<.*~$e[31m&$e[0m~" \
-e "s~^>.*~$e[32m&$e[0m~"
Each line item diff falls into 1 of 2 categories:
Each edge line in a TMG file starts with 2 indices into the vertex array. Meaning whenever a vertex is added or removed, all the others after it have different indices, and all references to them by edge lines change. Thus TMGs aren't diffable out of the box. The solution to this is to replace the numbers on the edge lines with a unique stable identifier. What better then the vertex labels? I won't be able to load these into the HDX, but so what; I only care about diffing them.
for g in `ls | grep -v 'simple\|traveled'`; do
echo $g
v_count=`tail -n +2 $g | head -n 1 | cut -f1 -d' '`
e_count=`tail -n +2 $g | head -n 1 | cut -f2 -d' '`
head -n `expr $v_count + 2` $g > ../schmaphs/$g
beg=`expr $v_count + 3`
end=`expr $e_count - 1 + $beg`
e=1
for lnum in $(seq $beg $end); do
line=`tail -n +$lnum $g | head -n 1`
i1=`echo "$line" | cut -f1 -d' '`
i2=`echo "$line" | cut -f2 -d' '`
v1=`expr $i1 + 3`
v2=`expr $i2 + 3`
v1=`tail -n +$v1 $g | head -n 1 | cut -f1 -d' '`
v2=`tail -n +$v2 $g | head -n 1 | cut -f1 -d' '`
echo $line | sed "s~^$i1 $i2 ~$v1 $v2 ~" >> ../schmaphs/$g
echo -en "$e/$e_count\r"
e=`expr $e + 1`
done
done
LOL, this took 3 hours and 8 seconds to run, and in the end didn't even work right. While waiting for it to finish, I rewrote it in C++ so the 2nd batch would go faster.
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
int main(int argc, char *argv[])
{ cout << argv[1] << endl;
ifstream ifile(argv[1]);
ofstream ofile(string("../schmaphs/")+argv[1]);
vector<string> L;
string line;
while (getline(ifile, line)) L.push_back(move(line));
ifile.close();
char* endptr;
size_t v_count = strtoul(&L[1][0], &endptr, 10);
size_t e_count = strtoul(endptr+1, &endptr, 10);
ofile << L[0] << '\n' << L[1] << '\n';
size_t ebeg = v_count + 2;
size_t eend = e_count + ebeg;
size_t lnum = 2;
for (; lnum < ebeg; lnum++)
ofile << L[lnum] << '\n';
for (; lnum < L.size(); lnum++)
{ string& l = L[lnum];
cout << &l - &L[0] << '/' << e_count << '\r';
char *p1, *p2;
size_t i1 = strtoul(&l[0], &p1, 10) + 2;
size_t i2 = strtoul(p1+1 , &p2, 10) + 2;
ofile << L[i1].substr(0, L[i1].find(' ')) << ' ';
ofile << L[i2].substr(0, L[i2].find(' ')) << p2 << '\n';
}
}
This took 4.6 seconds. LOL.
Looking at the diffs between graph edges shows that diffs don't just occur one-by-one. They come in clusters, in one of three patterns. These patterns can be mixed & matched in a graph, and occur however many times.
< Vertex adjacent to the U-turn point goes from being incorrectly being considered a hidden junction to getting collapsed
< Simple edge from the U-turn point to the deleted adjacent point
< Simple edge connecting the same points in the opposite direction
< Edge connecting the deleted point to some other point farther down the line
> Collapsed edge connecting the U-turn to the faraway point
Happens when the U-turn vertex is visible & the adjacent vertex is collapsible.
> U-turn vertex goes from being incorrectly collapsed to visible per having 1 incident edge
> Simple edge from the U-turn to the adjacent point
< Collapsed edge connecting the adjacent point to itself via the U-turn as an intermediate
Happens when the U-turn vertex is hidden & the adjacent vertex is visible or a junction.
< Simple edge from the U-turn point to the adjacent point
< Simple edge connecting the same points in the opposite direction
> A new simple edge from the U-turn to the adjacent point again. Only this time, label has a doubled route name, e.g. "DE,DE"
Happens when the U-turn & adjacent vertices are both visible.
< Shaping point adjacent to the U-turn point goes from being incorrectly being considered a hidden junction to getting collapsed
> U-turn vertex goes from being incorrectly collapsed to visible per having 1 incident edge
< Edge connecting the deleted point to some other point farther down the line
< Collapsed edge connecting the adjacent point to itself via the U-turn as an intermediate
> Collapsed edge connecting the U-turn to the faraway point
Happens when the U-turn & adjacent vertices are both hidden.
If collapsed graphs make sense, traveled graphs will be fine too. The decision whether or not to collapse has nothing to do with the new concurrencies at this point; no new more info to be gained. Nonetheless, this...
trav=$( \
for g in $graphs; do \
d=$(diff <(tail -n +2 _conc/graphs/$g.tmg | head -n 1) \
<(tail -n +2 _conc/graphs/$g-traveled.tmg | head -n 1 | sed -r 's~(.+ .+) .*~\1~') \
| grep '^[<>]')
if [ `echo "$d" | wc -w` -gt 0 ]; then echo "$g"; fi; done)
e=`echo -en '\e'`
for g in $trav; do
cv=$(tail -n +2 _conc/graphs/$g.tmg | head -n 1 | cut -f1 -d' ')
tv=$(tail -n +2 _conc/graphs/$g-traveled.tmg | head -n 1 | cut -f1 -d' ')
echo "$e[1m$g$e[0m"
d=$(diff <(tail -n +3 _conc/graphs/$g.tmg | head -n $cv) \
<(tail -n +3 _conc/graphs/$g-traveled.tmg | head -n $tv) | grep '^[<>]')
for l in $(echo "$d" | tr ' ' '%'); do
echo "$l" | tr '%' ' '
fgrep -ir -f <(echo "$l" | cut -f2 -d@ | cut -f1 -d% | sed 's~^+~~') ~/tm/UserData/rlist_files
done
done
...gives strong (though not foolproof) evidence that all uncollapsed vertices are used in .lists. Only 2 points gave no results, which were found manually
Oce/OttQue@+DIV_Cha_W
in use as +DIV_Cha
on the other route that didn't get the pick for the hidden vertex labelAirTrn@+SKIP_EmpOnly
used via an AltRouteName, TA
IIRC
Uncovered some concurrency problem while trying to expand the HIDDEN_JUNCTION datacheck to catch cases that'd produce a segment name mismatch error (#91, #178, #603) in datacheck mode before making it to the graph generation process.
eeea42f
):ME DE +DIV_Por1 Por
is not flagged as concurrent withME DE Por +DIV_Por2
.PA US19TrkPit I-376(69B)_S I-376(69A)
is concurrent withPA US19TrkPit I-376(69A) I-376(69B)_N
as expected (7a955f2
).The fact that the DIV points are hidden does not affect this.
The first thing that comes to mind is that ME DE is not concurrent with any other route, whereas PA US19TrkPit has 4 others in the mix. IIRC this fact caused wacky hijinks in my first epic concurrency debugathon back in 2018. OK yeah, that's it. Thank God, I didn't wanna do another one. https://github.com/TravelMapping/DataProcessing/blob/e280bd58d47f89cb4b64d7450dc666bfe1ad038e/siteupdate/cplusplus/tasks/concurrency_detection.cpp#L11
In this case
p[1]
is not colocated. It's a single point on a single route that's doing a 180° turn. Something like this would be overkill & less efficient; I should be able to move anif
and bung in anelse
block.