dr-jts / jts

The JTS Topology Suite is a Java library for creating and manipulating vector geometry.
Other
22 stars 6 forks source link

Polygonizing with OverlayNG? #3

Closed apjoseph closed 4 years ago

apjoseph commented 4 years ago

I just read your blog post and am excited to see if this does indeed eliminate dreaded TopologyExceptions!

I'm looking to test polygonization (unique areas of overlap) on a large Multipolygon postgis-derived dataset.

I took a look at the new logic in the overlay-sr and I'm not seeing an an obvious equivalent to Polygonizer in the branch and wondering if/where such functionality exists.

In the past I've had to split up datasets that have complex polygons with many thousands of vertices to avoid mem overflow on a dataset of only 100MB with 20+GB of available memory for polygonization. So I'm also curious if overlayng has any functionality that might help avoid overflow on polygonizing operations (like st_subdivide in postgis for example) while maximizing robustness.

dr-jts commented 4 years ago

So are you running a noding process over a set of polygons, and then polygonizing that to compute a full overlay?

You can use OverlayNG for the noding phase in the same way as the original overlay - just UnaryUnion the linework all together (see code here).

Unfortunately there isn't (yet) a Polygonizer equivalent in the new overlay (although of course it does have all the needed machinery for its own use). OverlayNG uses a new(ish) graph structure, which is more memory efficient than the one Polygonizer uses. So I have thought that it would be nice to change Polygonizer to use this new graph model. Maybe I can work on that sooner rather than later.

I'm surprised there is that much memory overhead for polygonization. Hopefully using the new graph structure can help with this. There is also another approach I've worked on in the past using a sweepline approach which has very low memory overhead. It's in my work queue to dust that off and get it released. Any chance of getting a copy of the problematic datasets?

dr-jts commented 4 years ago

I have also had some thoughts about using OverlayNG to provide a full polygonal coverage overlay (either to reduce an overlapping set of Polygons to a coverage, or to overlay two coverages). This will take a bit of work, but should be feasible.

apjoseph commented 4 years ago

Ah, so basically the idea is to use UnaryUnionNG.union on the dumped polygon linework and pass it to the existing Polygonizer which shouldn't itself introduce any new topology exceptions, correct?

This is somewhat similar to the approach I was already using in 1.61. I didn't use the existing polygonization directly on source polygons -but used the old snaprounding functionality that was already present :

  1. Reduce precision to two decimal places
  2. Extract Segment strings from MultiPolygons using SegmentStringUtil.extractSegmentStrings()
  3. Use org.locationtech.jts.noding.snapround.MCIndexSnapRounder to node/snapround segement strings
  4. Convert noded substrings into linestrings using SegmentStringUtil.toGeometry()
  5. UnaryUnionOp.union() snaprounded linestrings (where mem issue occurred)
  6. Polygonize unioned geometry

So the mem issue I was having wasn't actually on the polygonization portion, but in the unioning of all linestrings together into a single geometry for polygonization.

That coverage overlay functionality, sounds very exciting and helpful by the way -and I am greatly looking forward to it.

dr-jts commented 4 years ago

Ah, so basically the idea is to use UnaryUnionNG.union on the dumped polygon linework and pass it to the existing Polygonizer which shouldn't itself introduce any new topology exceptions, correct?

Correct. And should be a lot simpler than the process you are currently using. So now the process is:

  1. Convert polygons to linework (using say getBoundary) and collect into a MultiLineString.
  2. Call UnaryUnionNG on that geometry to produce a fully noded set of edges, using an appropriate precision model (note there is no need to round the linework coordinates - OverlayNG takes care of that itself)
    1. Polygonize the resulting noded linework

Let me know if it works! And particularly about memory usage.

That coverage overlay functionality, sounds very exciting and helpful by the way -and I am greatly looking forward to it.

Roger that. It's on the list.

dr-jts commented 4 years ago

@apjoseph Do you have stats for the number of vertices in your 100 MB file? By estimating from a shapefile (which is binary of course) I'm thinking that's about 50M vertices?

That is quite a large dataset, and it doesn't surprise me that it blows out memory with the old overlay. And I'm thinking it's likely to be an issue with the new overlay as well. However, as I mentioned I do have a experimental project which could potentially process this size of dataset in very little memory. I will see if I can get that into a state to release.

apjoseph commented 4 years ago

I did a test this morning with overlayNG. The linestring union threw a topology exception:

Exception in thread "main" org.locationtech.jts.geom.TopologyException: found non-noded intersection between LINESTRING ( -118.299085 33.821959, -118.299085 33.821818 ) and LINESTRING ( -118.299085 33.821959, -118.299085 33.821887 ) [ (-118.299085, 33.821959, N
aN) ]
        at org.locationtech.jts.noding.FastNodingValidator.checkValid(FastNodingValidator.java:140)
        at org.locationtech.jts.geomgraph.EdgeNodingValidator.checkValid(EdgeNodingValidator.java:81)
        at org.locationtech.jts.geomgraph.EdgeNodingValidator.checkValid(EdgeNodingValidator.java:46)
        at org.locationtech.jts.operation.overlay.OverlayOp.computeOverlay(OverlayOp.java:231)
        at org.locationtech.jts.operation.overlay.OverlayOp.getResultGeometry(OverlayOp.java:183)
        at org.locationtech.jts.operation.overlay.OverlayOp.overlayOp(OverlayOp.java:86)
        at org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.getResultGeometry(SnapIfNeededOverlayOp.java:75)
        at org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.overlayOp(SnapIfNeededOverlayOp.java:37)
        at org.locationtech.jts.operation.union.UnaryUnionOp.unionNoOpt(UnaryUnionOp.java:280)
        at org.locationtech.jts.operation.union.UnaryUnionOp.union(UnaryUnionOp.java:216)
        at org.locationtech.jts.operation.overlayng.UnaryUnionNG.union(UnaryUnionNG.java:49)

rawGF = GeometryFactory(PrecisionModel(10.0.pow(6)),4326) fixedGF = GeometryFactory(PrecisionModel(PrecisionModel.FLOATING))

Dataset is about 10 million vertices. The way I handle in 1.16 is by creating a grid and recursively subdividing it in parallel based on an index of all individual line segments until I get down to a manageable amount and then doing parallel union/polygonization on each portion of the grid,throwing out any polygons outside of it, and then recombining any split pieces. This is not ideal since the grid itself can introduce errors when recombining, but this does generally work well in speeding things up and reducing memory.

That experimental project sounds helpful!

dr-jts commented 4 years ago

That's disappointing. Are you able to provide a sample dataset which causes the issue? (Perhaps put a breakpoint and capture the two geometries which cause the exception?)

Clever approach with the subdividing, but yeah, sounds tricky to get it perfect.

I'm going to push my other project to Github soon, and will put a link on this issue.

apjoseph commented 4 years ago

Sorry for the delay. Here are the raw multipolygons whose boundaries cause the exception when unioned.

dr-jts commented 4 years ago

Thanks. What precision model was being used?

Actually I see now that the unary union is not using the OverlayNG code. So that's a bug.

Although I am not able to reproduce the failure using either overlay method.

apjoseph commented 4 years ago

Thanks. What precision model was being used?

    val gf4326 = GeometryFactory(PrecisionModel(10.0.pow(6)),4326)
    val gc = geoms.map {it -> it.boundary }.let{ gf4326.createGeometryCollection(it.toTypedArray())}
    UnaryUnionNG.union(gc,gf4326.precisionModel)
dr-jts commented 4 years ago

I did fix some bugs over the past couple of days. Can you try again with the most recent code?

Although as noted above, this isn't actually using OverlayNG inside the unary union (which is a bug I will work on).

The other way to run the boundary linework through OverlayNG is to make a MultiLineString from it, and union it with an empty LineString. This is actually what UnaryUnion does internally (since there is no benefit to using a cascaded union approach with lines).

dr-jts commented 4 years ago

It also seems like there is quite a bit of precision in that data. To me it looked like a scale factor of 10^8 is more appropriate.

I was able to run UnaryUnion using both original overlay and OverlayNG (with scale-1e8). Interestingly OverlayNG was about 4 times faster (perhaps because the old overlay needed to use some snapping, which is quite slow).

apjoseph commented 4 years ago

Can confirm I was able to get it to go through after reducing precision to 10^6 using the new OverlayNG.overlay method -and using 10^8 on UnaryUnion. That's a relief!

Also verified that the reducePreicsion algorithm also doesn't create any empty geometries when reducing precision on any of the polygons -which was another major problem I had in the old JTS. Very excited about that as well.

dr-jts commented 4 years ago

Can confirm I was able to get it to go through after reducing precision to 10^6 using the new OverlayNG.overlay method -and using 10^8 on UnaryUnion. That's a relief!

Good to hear.

Also verified that the reducePreicsion algorithm also doesn't create any empty geometries when reducing precision on any of the polygons -which was another major problem I had in the old JTS. Very excited about that as well.

The old JTS GeometryPrecisionReducer is a bit of a hack which has all kinds of potential problems, and isn't very efficient. So it will be replaced by the new code as soon as its had a bit more testing. (I'll keep the old API, just replace it with a call to the new OverlayNG PrecisionReducer.)

dr-jts commented 4 years ago

@apjoseph I pushed my alternative FORSE overlay code to a repo here. It's a WIP, but it should work well enough for you to test its characterstics for memory efficiency. I can provide more details if interested.

apjoseph commented 4 years ago

Excellent, I'll likely get around to running this on large dataset likely in the next week or two. Looking like this is the most relevant. I noticed this lib doesn't have a pom.xml or build.gradle -is forsee meant to be added as a module in JTS or standalone?

dr-jts commented 4 years ago

Yes, I haven't put any work into a build chain for FORSE. It's more likely to end up as a JTS API.

dr-jts commented 4 years ago

You might also be interested in the recent addition to the OverlayNG branch of the SnappingNoder. It is a noder which uses full-precision snapping rather than snap-rounding. It's very new, but so far has been very robust in my testing. I'm optimistic that it will replace the floatng-point noding strategy used currently, and that it will reduce the need for using snap-rounding.

Usage examples here. As usual I'm keen to get some beta-testing!