[Bug] PyGRASS: VectorTopo.viter appears to yield bad/duplicate points.

Viech commented 3 years ago

Describe the bug I am using v.net to generate points for a network so that I can assign attributes to them. When I load these points with VectorTopo in PyGRASS (using VectorTopo.viter("points")), I find that some of their coordinates are wrong. More precisely, when I draw the points (using Python tools unrelated to GRASS), it appears that some points are missing while others are drawn multiple times in the same location. When I load the network's nodes instead of the points (using VectorTopo.viter("nodes")), the network is drawn just fine and all nodes are where I expect them.

To Reproduce I made a Python script to reproduce the problem. I did not yet learn how to export the TEST vector map for you to reproduce locally, however the error should become apparent from the output of the script:

#!/usr/bin/python

from grass.pygrass.modules.shortcuts import vector as v
from grass.pygrass.vector import VectorTopo

print("TEST toplogy:")
v.info(map="TEST", flags="t")

print("\nGenerating points at node locations...")
v.net(
    input="TEST",
    operation="nodes",
    output="TEST_WITH_POINTS",
    overwrite=True,
    quiet=True,
)

print("\nTEST_WITH_POINTS toplogy:")
v.info(map="TEST_WITH_POINTS", flags="t")

print("\nLoading sorted node and point coordinates with VectorTopo.viter...")
with VectorTopo("TEST_WITH_POINTS", mode="r") as features:
    node_coords = sorted(node.coords() for node in features.viter("nodes"))
    point_coords = sorted(point.coords() for point in features.viter("points"))

print("\nComparison of 'node' and 'point' coordinates:")
for i, (p, n) in enumerate(zip(node_coords, point_coords)):
    n = node_coords[i]
    p = point_coords[i]
    print("{:3d}: {} {} {}".format(i, n, "==" if n == p else "!=", p))

print("\nAre there as many points as nodes?")
print(len(node_coords) == len(point_coords))

print("\nAre point coordinates a subset of node coordinates?")
print(set(point_coords).issubset(set(node_coords)))

print("\nAre node coordinates a subset of point coordinates?")
print(set(node_coords).issubset(set(point_coords)))

This outputs the following (some lines cut):

TEST toplogy:
nodes=200
points=0
lines=313
boundaries=0
centroids=0
areas=0
islands=0
primitives=313
map3d=0

Generating points at node locations...
WARNING: Vector map <TEST_WITH_POINTS> already exists and will be
         overwritten

TEST_WITH_POINTS toplogy:
nodes=200
points=200
lines=313
boundaries=0
centroids=0
areas=0
islands=0
primitives=513
map3d=0

Loading sorted node and point coordinates with VectorTopo.viter...

Comparison of 'node' and 'point' coordinates:
  0: (590450.0, 4922850.0) != (590650.0, 4922950.0)
  1: (590650.0, 4922950.0) == (590650.0, 4922950.0)
  2: (590950.0, 4917950.0) == (590950.0, 4917950.0)
  3: (591050.0, 4915950.0) != (590950.0, 4917950.0)
  4: (591050.0, 4922550.0) != (591050.0, 4915950.0)
  5: (591250.0, 4922350.0) != (591050.0, 4922550.0)
  6: (591550.0, 4921750.0) != (591550.0, 4925250.0)
  7: (591550.0, 4925250.0) == (591550.0, 4925250.0)
  8: (591650.0, 4916050.0) == (591650.0, 4916050.0)
  9: (591650.0, 4920850.0) == (591650.0, 4920850.0)
 10: (591650.0, 4921350.0) != (591650.0, 4924350.0)
 11: (591650.0, 4924350.0) == (591650.0, 4924350.0)
 12: (591650.0, 4924950.0) == (591650.0, 4924950.0)
 13: (591750.0, 4920150.0) != (591650.0, 4924950.0)
 14: (591850.0, 4921550.0) != (591750.0, 4920150.0)
 15: (591850.0, 4921650.0) != (591850.0, 4921550.0)
 16: (591950.0, 4921650.0) != (591850.0, 4921650.0)
 17: (591950.0, 4922050.0) != (591950.0, 4921650.0)
 18: (591950.0, 4922250.0) != (591950.0, 4921650.0)
 19: (591950.0, 4922950.0) != (591950.0, 4922050.0)
 20: (591950.0, 4924550.0) == (591950.0, 4924550.0)
 21: (592050.0, 4921650.0) == (592050.0, 4921650.0)
 22: (592050.0, 4922050.0) != (592050.0, 4921650.0)
 23: (592050.0, 4922950.0) != (592050.0, 4922050.0)
 24: (592150.0, 4921150.0) != (592050.0, 4922050.0)
[…]
175: (604150.0, 4926150.0) != (604450.0, 4921250.0)
176: (604250.0, 4926050.0) != (604450.0, 4925750.0)
177: (604350.0, 4921450.0) != (604450.0, 4925850.0)
178: (604350.0, 4925950.0) != (604450.0, 4925850.0)
179: (604450.0, 4921250.0) != (604550.0, 4915450.0)
180: (604450.0, 4925750.0) != (604550.0, 4915450.0)
181: (604450.0, 4925850.0) != (604550.0, 4925850.0)
182: (604550.0, 4915450.0) != (606250.0, 4920550.0)
183: (604550.0, 4925850.0) != (606450.0, 4920350.0)
184: (606250.0, 4920550.0) != (606450.0, 4920350.0)
185: (606450.0, 4920350.0) != (606550.0, 4920650.0)
186: (606550.0, 4916550.0) != (606850.0, 4920650.0)
187: (606550.0, 4920650.0) != (606850.0, 4920650.0)
188: (606650.0, 4920750.0) != (607050.0, 4917850.0)
189: (606850.0, 4920650.0) != (607150.0, 4916650.0)
190: (606950.0, 4916650.0) != (607150.0, 4916650.0)
191: (607050.0, 4917850.0) != (607150.0, 4916650.0)
192: (607150.0, 4916650.0) != (607150.0, 4916850.0)
193: (607150.0, 4916850.0) == (607150.0, 4916850.0)
194: (607150.0, 4916950.0) == (607150.0, 4916950.0)
195: (607250.0, 4916750.0) != (607150.0, 4916950.0)
196: (607450.0, 4926350.0) != (607250.0, 4916750.0)
197: (608150.0, 4916450.0) == (608150.0, 4916450.0)
198: (608250.0, 4915850.0) == (608250.0, 4915850.0)
199: (608350.0, 4926450.0) == (608350.0, 4926450.0)

Are there as many points as nodes?
True

Are point coordinates a subset of node coordinates?
True

Are node coordinates a subset of point coordinates?
False

Expected behavior In the above output, all comparison lines should read == and none should read !=. All questions should be answered with True.

Note in particular that the script sorts the coordinates, so the problem is not that the points and nodes would be retrieved in different orders. Instead the returned sets of coordinates are not equal. It can be seen from the answers to the three questions above that some nodes have multiple points placed in their location while some nodes have no points placed in their location.

Screenshots I can add images of the drawings, if it helps.

System description

Operating System: Linux 5.11.16-arch1-1
GRASS GIS version: 7.8.5-1

% g.version -rge
version=7.8.5
date=2020
revision=4e3f393
build_date=2021-04-27
build_platform=x86_64-pc-linux-gnu
build_off_t_size=8
libgis_revision=2021-04-27T07:33:34+00:00
libgis_date=2021-04-27T10:00:00+02:00
proj=8.0.0
gdal=3.2.2
geos=3.9.1
sqlite=3.35.5

% python3 -c "import sys, wx; print(sys.version); print(wx.version())" 
3.9.4 (default, Apr 20 2021, 15:51:38) 
[GCC 10.2.0]
4.1.1 gtk3 (phoenix) wxWidgets 3.1.5

Additional context The TEST network was generated by me using v.clean tool=break,rmdup on a number of least cost paths (r.cost and r.path) patched together with v.patch.

Viech commented 3 years ago

The following works fine (compare with original script):

print("\nLoading node and point coordinates...")
point_coords = []
with VectorTopo("TEST_WITH_POINTS", mode="r") as features:
    node_coords = tuple(node.coords() for node in features.viter("nodes"))
    for f in features:
        if isinstance(f, Point):
            point_coords.append(f.coords())

So the culprit appears to be VectorTopo.viter.

Viech commented 3 years ago

I've worked around the issue as explained above but I'd like to share my suspicion that viter should maybe respect an ID offset as currently ids are defined within viter starting at 1 unconditionally (source):

ids = (indx for indx in range(1, self.number_of(vtype) + 1))

I believe IDs are assigned per feature and not per feature type? This could mean that incorrect IDs are assigned here and some lookups would then access the data of lines instead of points. Since points are placed at line nodes and multiple lines can share a node, this could explain the duplicates.

aaronsms commented 3 years ago

I can take a look at this, could you please provide a vector map or perhaps steps to produce a vector map? I believe the ID is meant to start on 1, namely we will have [Point(1..200) and Node(1..200)], the offsets (at least from what I observe for the case of Node) is handled at lower levels during instantiation. However, on the first glance, this is not the case for Point, where we read from the entire map_info, instead of just the features belonging to Point, using c_read_line. I suspect (like you have mentioned), the viter for Node is accurate but the viter for Point has interleaving Nodes in between.

Under class Point,

    def __init__(self, x=0, y=0, z=None, **kargs):
        super(Point, self).__init__(**kargs)
        if self.id and self.c_mapinfo:
            self.read()    # Here, we do a read that uses c_read_line that reads from all lines instead of just Point

Viech commented 3 years ago

I think this can be reproduced with pretty much any line network that points are added to using v.net operation=nodes. For instance, you can copy the roads map in the spearfish location to TEST (run v.db.reconnect.all to make it use sqlite) and then run the script in my opening post. You will see only a few node/point pairs with matching coordinates.

OSGeo / grass

[Bug] PyGRASS: VectorTopo.viter appears to yield bad/duplicate points. #1560