Turfjs / turf

A modular geospatial engine written in JavaScript and TypeScript
https://turfjs.org/
MIT License
9.16k stars 934 forks source link

Support all feature types for bboxClip, including FeatureCollection #1565

Open stevage opened 5 years ago

stevage commented 5 years ago

I was a bit surprised that there doesn't seem to be a way to clip a whole FeatureCollection to a given rectangle. This could be implemented by simply extending the range of types that bboxClip supports, including Points.

plepe commented 5 years ago

I made a bbox-clip function based upon turf which supports GeometryCollections, maybe this could help you: https://www.npmjs.com/package/bbox-clip

morganherlocker commented 5 years ago

Generally we've made most functions operate at the feature level. This encourages the use of spatial indexing, and avoids situations where users throw huge unoptimized collections at turf expecting it to work like a database. That said, we've been somewhat inconsistent with this design, which causes confusion.

stevage commented 5 years ago

Ah this is interesting.

This encourages the use of spatial indexing

Can you elaborate on that? I never considered spatial indexing in the browser before. I guess you mean something like flatbush? Would it be crazy for Turf to incorporate indexing?

That said, we've been somewhat inconsistent with this design, which causes confusion.

Maybe worth documenting more of the design decisions, whether currently consistently followed or not.

morganherlocker commented 5 years ago

Can you elaborate on that? I never considered spatial indexing in the browser before. I guess you mean something like flatbush? Would it be crazy for Turf to incorporate indexing?

Yes, exactly. It's generally an anti-pattern to use any turf functions against a large number of features without indexing first. I've thought about incorporating indexing to make this easier, but I've found that the index pattern is heavily dependent on the shape of data being processed, which is outside the scope of Turf. Spatial indexing is a really deep rabbit hole and the list of valid indexers is really long (eg: flatbush, rbush, kdbush, tile-cover, node-s2, h3-js, etc. (!!)), so I would not feel comfortable giving an overly broad recommendation.

Maybe worth documenting more of the design decisions, whether currently consistently followed or not.

We have the "Getting Started" section on the website today, which is probably not the right fit for an advanced topic like this. Maybe we need an "Advanced Usage" section that gives general tips for high performance applications like spatial indexing and streaming?

stevage commented 5 years ago

It's generally an anti-pattern to use any turf functions against a large number of features without indexing first.

Yeah, I have learnt this the hard way recently. Quite surprised at just how slow it turned out to be to run operations like clipping polygons to bboxes, or point-in-polygon lookups. But incorporating indexing seemed really complicated.

Spatial indexing is a really deep rabbit hole and the list of valid indexers is really long (eg: flatbush, rbush, kdbush, tile-cover, node-s2, h3-js, etc. (!!)), so I would not feel comfortable giving an overly broad recommendation.

Do any of them perform generally well in a wide variety of situations, even if outperformed in specific situations? Maybe if there was some mechanism to make it easy to enable some general purpose indexing method, and then a plugin mechanism to use something more specific? I dunno, just spitballing.

(To be clear, I'm not suggesting work for someone else to do, I'm hoping to clarify the problem and maybe find a solution that I could implement.)

We have the "Getting Started" section on the website today, which is probably not the right fit for an advanced topic like this.

Yeah. That section is (reasonably) enough just installation. Maybe a "Guide to using Turf with large datasets"? It might also be worth flagging in certain operations that they shouldn't be naively run on large arrays of features, with a link to that section?

Is there an example somewhere of how to combine Turf with an indexing module for operations like point-in-polygon, finding overlapping polygons etc?

rowanwins commented 5 years ago

Hey @stevage

A couple of Turf operations were/are using rbush under the hood, and in v7 I have included a spatial-index`module which basically wraps rbush to make it simpler to send in geojson, I can't think of which modules use it off the top of my head and I hadn't decided whether it'd include it in the docs yet for public use, or whether it would be more of an internal module. That said I think the use of indexing within Turf has been as a surrogate to having decent sweepline algorithms which I'm now getting more familiar with so I'm not sure it'll stay in the longer term.

In terms of the whole indexing strategy thing I think this is one of those challenging topics. Improved documentation around the topic and alerting people to potential pitfalls would certainly be a start.

Cheers

morganherlocker commented 5 years ago

Do any of them perform generally well in a wide variety of situations, even if outperformed in specific situations? Maybe if there was some mechanism to make it easy to enable some general purpose indexing method, and then a plugin mechanism to use something more specific? I dunno, just spitballing.

rbush does a good job in most situations where you have a dataset that fits into RAM. I would not want rbush to be used directly inside Turf though as a general purpose index, since there are performance tradeoffs that would be incurred by all users, and this goes against the design of Turf. The rbush API is small and well documented, so it is a great option when needed, but there are many cases where a different indexing strategy is needed, such as when your dataset does not fit into RAM. Since these libraries already work together nicely, this should continue to be up to the user to implement.

Is there an example somewhere of how to combine Turf with an indexing module for operations like point-in-polygon, finding overlapping polygons etc?

tile-reduce uses vector tiles as an indexing strategy. This strikes a pretty good balance by letting you brute force algorithms without having to think about tradeoffs too much. Many tile-reduce scripts will also use rbush for additional performance, depending on the operation. If you are running bulk analysis in node, and don't want to think about optimization too much, I would check it out.

A couple of Turf operations were/are using rbush under the hood, and in v7 I have included a spatial-index`module which basically wraps rbush to make it simpler to send in geojson, I can't think of which modules use it off the top of my head and I hadn't decided whether it'd include it in the docs yet for public use, or whether it would be more of an internal module.

As @rowanwins notes, we've used indexing in Turf in the past in limited situations. These should remain restricted to where the index is needed to perform an algorithm efficiently in nearly all situations, but not when it would benefit a particular type of a dataset at the expense of others. Its important to note that nearly all indexers have significant startup cost. They do some expensive work up front to make future calls fast. In turf, functions should never have state, so this pattern cannot be implemented internally.