TravelMapping / DataProcessing

Data Processing Scripts and Programs for Travel Mapping Project
4 stars 6 forks source link

collapsed edges outside area graph radius #613

Open yakra opened 7 months ago

yakra commented 7 months ago

My intuition told me that in every simple/collapsed/traveled graph trio, the number of simple vertices minus the number of collapsed vertices = the number of simple edges minus the number of collapsed edges. Same for simple-traveled & traveled-collapsed. The idea is, whenever we lose 1 vertex due to a collapse, 2 edges become 1 and we're down 1, right? Right?

I bashed together a shell script (pun intended) to test this assumption, and it gave amsterdam10-area as a counterexample. Conveniently, an area @jteresco knows well. :) From simple to collapsed, we're down 8 vertices & 7 edges. What gives?

Loading up the HDX in 2 browser tabs, 3 differences stand out...

  1. In the north, the simple graph has a short edge ending at a hidden point west of NY30/NY349. In the collapsed graph, that hidden point is no longer a vertex in its own right, and the would-be edge extends farther W to NY349@CR154 -- which is outside the 10 mi radius. The edge is not included because it ends at a vertex that's not included. So far so good.
  2. In the SE quadrant, the simple graph has a long stretch of Thruway E of exit 27 that the collapsed graph doesn't. At first, this looks suspect, at least at the zoom level it loads up in my browser. Zoom in and you'll see this segment ends at a shaping point very close to NY5S/NY160. In the end it's the same deal as above; we have a simple edge ending at a hidden vertex within the radius, but no collapsed edge, because the next visible vertex is too far away.
  3. But what's this? The collapsed graph has an extra edge! In the northwest, NY30A, NY29_E/NY30A_N <-> NY29A/NY30A, Johnstown to Gloversville. What's happening here is that there's a hidden point, NY30A@+X02, 10.16 mi away from the PlaceRadius center.
    • It's not in the simple graph, so neither are the 2 edges that connect to it.
    • And it's not in the collapsed graph either, at least, not as a vertex. What it is is just a lat/long pair listed as an intermediate_point along the tmg edge line (1 8 NY30A 43.024153 -74.360275). We don't check intermediate_points for PlaceRadius membership, but the 2 endpoints are in, so the edge is included.
    • If somebody were to .list travels starting/ending at NY NY30A X02, this would remain a bona fide vertex in the traveled graph, and fail to collapse. Its 2 incident edges would end at a vertex outside the PlaceRadius and not be included, same as in the simple graph. This happens a couple times in mhc25-area-traveled.tmg.

In short? Edges that go outside the radius (at their ends) are obviously excluded. But edges that go outside the radius between the ends can still be included.

Should edges with intermediate_points outside the radius be excluded from area graphs?

@jteresco, thoughts?

yakra commented 7 months ago

Shell script to find counterexamples:

for g in `ls | egrep -v '\-simple\.tmg$|\-traveled.tmg$' | sed 's~\.tmg$~~'`; do
  cv=`head -n 2 $g.tmg | tail -n 1 | cut -f1 -d' '`
  ce=`head -n 2 $g.tmg | tail -n 1 | cut -f2 -d' '`
  tv=`head -n 2 $g-traveled.tmg | tail -n 1 | cut -f1 -d' '`
  te=`head -n 2 $g-traveled.tmg | tail -n 1 | cut -f2 -d' '`
  sv=`head -n 2 $g-simple.tmg | tail -n 1 | cut -f1 -d' '`
  se=`head -n 2 $g-simple.tmg | tail -n 1 | cut -f2 -d' '`

  if [ `expr $sv - $cv` != `expr $se - $ce` ]; then echo $g S-C mismatch; fi
  if [ `expr $sv - $tv` != `expr $se - $te` ]; then echo $g S-T mismatch; fi
  if [ `expr $tv - $cv` != `expr $te - $ce` ]; then echo $g T-C mismatch; fi
done

See vertex & edge counts for the 7 affected areas:

for g in amsterdam10-area la125-area mhc25-area rmu25-area siena25-area stlouis25-area westfield5-area; do
  echo $g
  for file in $g-simple $g $g-traveled; do
    head -n 2 $file.tmg | tail -n 1;
  done
  echo
done
jteresco commented 7 months ago

Thanks for digging in like this. It's definitely not a situation I considered. It's really good with me either way on whether edges with both endpoints in the radius but intermediate points outside end up being included in the PlaceRadius graphs. It sounds like it makes some things simpler if we exclude them, and it's an easy fix to do so. So if that's an accurate statement, let's exclude them.

yakra commented 7 months ago

Thanks for digging in like this. It's definitely not a situation I considered.

LOL, even though graph generation speed is still officially on the backburner, I had a flurry of ideas, and had to jot some stuff down & test out a few ideas before forgetting it all.

It sounds like it makes some things simpler if we exclude them, and it's an easy fix to do so. So if that's an accurate statement, let's exclude them.

Makes me wonder if you read an earlier version the OP; I originally had one last bullet point:

  • If it makes my assumption at the beginning of this post true, it'd make a potential future improvement easier to code.

...but deleted that after realizing the scenario I was envisioning wouldn't work. (I could MAKE the scenario work, but in doing so would bring about an easier way to do things.) That said, it could still make other things easier, things I haven't thought of yet. Either way, definitely an easy fix.

It's really good with me either way on whether edges with both endpoints in the radius but intermediate points outside end up being included in the PlaceRadius graphs.

This being said, I'll call this low priority for now, and hold off. One of the more lightweight upcoming changes would mean rewriting this.

In any case, IMO it's The Right Thing To Do.