NetTopologySuite / NetTopologySuite.IO.ShapeFile

The ShapeFile IO module for NTS.
33 stars 25 forks source link

Read a polygon with a hole from a ShapeFile #80

Open DGuidi opened 2 years ago

DGuidi commented 2 years ago

To investigate #70

DGuidi commented 2 years ago

from Shapefile specs, page 8-9

The order of vertices or orientation for a ring indicates which side of the ring is the interior of the polygon... Vertices of rings defining holes in polygons are in a counterclockwise direction. Vertices for a single, ringed polygon are, therefore, always in clockwise order. The order of rings in the points array is not significant.

so basically testing for rings CW and CCW orientation is the correct way of doing things. Maybe is this code that try to clean data that is inherently dirty?

DGuidi commented 2 years ago

The shell_bad_ccw.shp contains a single polygon, with:

The idea of this PR is to let the ShapefileReader reads this kind of data as two separate shells, so a MultiPolygon is returned.

FObermaier commented 2 years ago

The idea of this PR is to let the ShapefileReader reads this kind of data as two separate shells, so a MultiPolygonis returned.

Does one polygon contain the other? In that case the result would be invalid.

DGuidi commented 2 years ago

Does one polygon contain the other? In that case the result would be invalid.

Yep, exactly (made explicit in the unit test).

I see few different strategies:

  1. let the develop behavior: now a single polygon is read, and it's the hole (interpreted as a shell from NTS). the actual shell (interpreted as a hole from NTS) is discarded from the output because NTS doesn't find any shell that can be considered a "shell for this hole" (sorry for the poor explaination)
  2. accept the PR behavior: both polygons are read and returned, the invalid data is appended at the end of the multipolygon and the geometry is not valid. It's up to the consumer to fix the geometry applying the logic.
  3. throw an exception if the data is not valid (i.e.: unassigned data is found); this means that ShapefileReader.Read() returns false, because of this code and the reason must be inspected checking the windows messages (not good, but removing the catch is a breaking change).
  4. PUT YOUR IDEA HERE

I'm open to suggestions :)

FObermaier commented 2 years ago

@DGuidi , I think we should

  1. Order the polygon rings by their area (as if they were Polygons without holes)
  2. Pick the largest as 1st shell
  3. Check the others if they are contained by any of the shells and if so, make them a hole of that shell, otherwise create a new shell (in that case we have a MultiPolygon).
  4. Repeat 3 until there are no more rings left
  5. Build the valid [Multi]Polygon.

I think the PR I pointed you to yesterday does it in a similar way.

DGuidi commented 2 years ago

I think the PR I pointed you to yesterday does it in a similar way

I will do a check, thanks The code in the WIP PR looks similar to the code actually in the standard lib to me

DGuidi commented 2 years ago

I'm working on resolving an issue with this geom: MULTIPOLYGON (((-124.134 -79.199, -124.141 -79.316, -124.164 -79.431, -124.202 -79.542, -124.254 -79.647, -124.319 -79.745, -124.396 -79.833, -124.484 -79.91, -124.582 -79.975, -124.687 -80.027, -124.798 -80.065, -124.913 -80.088, -125.03 -80.095, -125.147 -80.088, -125.262 -80.065, -125.373 -80.027, -125.478 -79.975, -125.576 -79.91, -125.664 -79.833, -125.741 -79.745, -125.806 -79.647, -125.858 -79.542, -125.896 -79.431, -125.919 -79.316, -125.926 -79.199, -125.919 -79.082, -125.896 -78.967, -125.858 -78.856, -125.806 -78.751, -125.741 -78.653, -125.664 -78.565, -125.576 -78.488, -125.478 -78.423, -125.373 -78.371, -125.262 -78.333, -125.147 -78.31, -125.03 -78.303, -124.913 -78.31, -124.798 -78.333, -124.687 -78.371, -124.582 -78.423, -124.484 -78.488, -124.396 -78.565, -124.319 -78.653, -124.254 -78.751, -124.202 -78.856, -124.164 -78.967, -124.141 -79.082, -124.134 -79.199),(-124.438 -79.199, -124.443 -79.122, -124.459 -79.046, -124.483 -78.973, -124.518 -78.903, -124.561 -78.839, -124.612 -78.781, -124.67 -78.73, -124.734 -78.687, -124.804 -78.652, -124.877 -78.628, -124.953 -78.612, -125.03 -78.607, -125.107 -78.612, -125.183 -78.628, -125.256 -78.652, -125.326 -78.687, -125.39 -78.73, -125.448 -78.781, -125.499 -78.839, -125.542 -78.903, -125.577 -78.973, -125.601 -79.046, -125.617 -79.122, -125.622 -79.199, -125.617 -79.276, -125.601 -79.352, -125.577 -79.425, -125.542 -79.495, -125.499 -79.559, -125.448 -79.617, -125.39 -79.668, -125.326 -79.711, -125.256 -79.746, -125.183 -79.77, -125.107 -79.786, -125.03 -79.791, -124.953 -79.786, -124.877 -79.77, -124.804 -79.746, -124.734 -79.711, -124.67 -79.668, -124.612 -79.617, -124.561 -79.559, -124.518 -79.495, -124.483 -79.425, -124.459 -79.352, -124.443 -79.276, -124.438 -79.199)),((-124.582 -79.199, -124.586 -79.257, -124.597 -79.315, -124.616 -79.371, -124.642 -79.423, -124.674 -79.472, -124.713 -79.516, -124.757 -79.555, -124.806 -79.587, -124.858 -79.613, -124.914 -79.632, -124.972 -79.643, -125.03 -79.647, -125.088 -79.643, -125.146 -79.632, -125.202 -79.613, -125.254 -79.587, -125.303 -79.555, -125.347 -79.516, -125.386 -79.472, -125.418 -79.423, -125.444 -79.371, -125.463 -79.315, -125.474 -79.257, -125.478 -79.199, -125.474 -79.141, -125.463 -79.083, -125.444 -79.027, -125.418 -78.975, -125.386 -78.926, -125.347 -78.882, -125.303 -78.843, -125.254 -78.811, -125.202 -78.785, -125.146 -78.766, -125.088 -78.755, -125.03 -78.751, -124.972 -78.755, -124.914 -78.766, -124.858 -78.785, -124.806 -78.811, -124.757 -78.843, -124.713 -78.882, -124.674 -78.926, -124.642 -78.975, -124.616 -79.027, -124.597 -79.083, -124.586 -79.141, -124.582 -79.199),(-124.896 -79.199, -124.897 -79.181, -124.9 -79.164, -124.906 -79.148, -124.914 -79.132, -124.923 -79.117, -124.935 -79.104, -124.948 -79.092, -124.963 -79.083, -124.979 -79.075, -124.995 -79.069, -125.012 -79.066, -125.03 -79.065, -125.048 -79.066, -125.065 -79.069, -125.081 -79.075, -125.097 -79.083, -125.112 -79.092, -125.125 -79.104, -125.137 -79.117, -125.146 -79.132, -125.154 -79.148, -125.16 -79.164, -125.163 -79.181, -125.164 -79.199, -125.163 -79.217, -125.16 -79.234, -125.154 -79.25, -125.146 -79.266, -125.137 -79.281, -125.125 -79.294, -125.112 -79.306, -125.097 -79.315, -125.081 -79.323, -125.065 -79.329, -125.048 -79.332, -125.03 -79.333, -125.012 -79.332, -124.995 -79.329, -124.979 -79.323, -124.963 -79.315, -124.948 -79.306, -124.935 -79.294, -124.923 -79.281, -124.914 -79.266, -124.906 -79.25, -124.9 -79.234, -124.897 -79.217, -124.896 -79.199))) image

This multipolygon is valid (two separate polygons each with a hole) but using the suggested logic the data from the shapefile is considered a polygon. I'm working on it.

DGuidi commented 2 years ago

@FObermaier I slightly changed the logic. Now when a potential hole of a shell is found, we check also the shell holes, to see if any hole contains the potential hole. If so, the potential hole can not be considered a hole for the shell, but a separate geometry. It works but I think I can miss few other edge cases.

DGuidi commented 2 years ago

@FObermaier any advice on this?

FObermaier commented 2 years ago

I actually don't quite know how to proceed. There are complaints that current ShapeFile implementation is (dreadfully) slow. That is why I pointed you to the WIP/redo everyting PR. If we do thorough topology test on the rings we read, I assume it gets even worse. Maybe we can enable the thorough test conditionally, if the user of the library really wants it?

DGuidi commented 2 years ago

I actually don't quite know how to proceed.

Actually, I have the same feeling: with the "new" polygon builder logic all the tests are green, but I fear that some behavior not covered by some test can be broken.

That is why I pointed you to the WIP/redo everyting PR

that shows the same problem, actually

If we do thorough topology test on the rings we read

the "new" polygon builder logic I think is similar to the older one, in the sense that I didn't expected to be too much slower. But actually this is just a feeling, I haven't tried no comparison tests

we can enable the thorough test conditionally

I can work on it: a flag can be helpful also to build some performance test

my2cents

DGuidi commented 2 years ago

@FObermaier added a "temporary" static property as a flag, I need to think a bit better about how to manage the entire "flag" thing... suggestions are welcome!

after a brief inspection: adding a flag property to the shapefiledatareader, means that the same flag shall be copied to the shapefilereader and then to the polygonhandler, that is created in a static factory method of the shapefile class, so we must ensure that any call to the factory method should also handle the flag... additionally, the Shapefile.CreateDataReader needs to accept another parameter, just to enable the new behavior.

viable, not so complicated at all, maybe acceptable for an "experiment", but I hope to find a better alternative

DGuidi commented 2 years ago

I didn't expected to be too much slower

but actually - based on some poor-man's performance tests - it's much (~3X) slower

    WRITE => elapsed: '00:00:00.4138680'
    ShapeFileDataReader
    flag DISABLED => elapsed: '00:00:00.4381579'
    flag ENABLED => elapsed: '00:00:01.3507803'
    ShapeReader
    flag DISABLED => elapsed: '00:00:00.4276830'
    flag ENABLED => elapsed: '00:00:01.2875299'
DGuidi commented 2 years ago

just a small note: tests marked as Explicit are always executed in my machine (using VS2019). this is why I used Ignore instead: now I understood that probably it's something related to my setup, and not a general problem.

FObermaier commented 2 years ago

AFAIK Explicit will always run from within the UI (VisualStudio).

FObermaier commented 2 years ago

In GDAL/OGR shapefile driver, all rings are converted to Polygons and afterwards sent to organizePolygons: https://github.com/OSGeo/gdal/blob/363178015ae009fa1187d8c1e20b3587db43aa58/ogr/ogrgeometryfactory.cpp#L1455-L2144

I have not bisected the algorithm altogether, but it seems to be a bit of everything...

DGuidi commented 2 years ago

AFAIK Explicit will always run from within the UI (VisualStudio).

exactly what happens to me

DGuidi commented 2 years ago

"For faster computation, a polygon is considered to be inside another one if a single point of its external ring is included into the other one"

maybe we can refactor this check to speed up a bit the computation:

bool IsHoleContainedInShell(LinearRing shell, LinearRing hole) =>
   //shell.EnvelopeInternal.Contains(hole.EnvelopeInternal) &&
   PointLocation.IsInRing(hole.GetCoordinateN(0), shell.Coordinates);
FObermaier commented 2 years ago

I'd assume that without the cheap envelope precondition test we'll see a performance degeneration.

DGuidi commented 2 years ago

I'd assume that without the cheap envelope precondition test we'll see a performance degeneration.

I just mean that IsHoleContainedInShell check should just use PointLocation.IsInRing

FObermaier commented 2 years ago

I got that, but PointLocation.IsInRing is the expensive test (compared to shell.EnvelopeInternal.Contains(hole.EnvelopeInternal)). If we don't do the envelope test, the PointLocation.IsInRing test is performed more often which will lead to performance degeneration.

DGuidi commented 2 years ago

If we don't do the envelope test, the PointLocation.IsInRing test is performed more often which will lead to performance degeneration.

yep now I see your point. curiously, from performance tests I see performance improvements

FObermaier commented 2 years ago

yep now I see your point. curiously, from performance tests I see performance improvements

If you only have polygons with holes that might be true, but for multipolygons whose shell ring envelopes don't intersect, I doubt that.

DGuidi commented 2 years ago

can be this PR cancelled, considering the discussion in #81, and in particular this comment ? Maybe part of this work can be added to #69 , if worthy