Allow for parsing of geojson Features and FeatureCollections

willcohen commented 6 years ago

Re #11, this re-adds jts2geojson, and uses its GeoJSONFactory to parse the string differently depending on if it is a Geometry, Feature, or FeatureCollection.

One question is -- what should be returned with a FeatureCollection? This PR currently returns a vector of Geometries.

worace commented 6 years ago

Looks pretty good; any chance we could add a test that describes what happens with the properties from a GeoJSON input once we parse them? I think JTS may have some kind of metadata object it will stash them in, but I'm not actually sure about that.

And I would be a little bit concerned about what happens to properties if we convert them internally to, say, a Spatial4J object. I don't think it's a blocker if we can't support them but it would be nice to be explicit about how they behave.

willcohen commented 6 years ago

I'll work on that test tomorrow.

As of right now, the properties are totally dropped (jts2geojson has .getGeometry and .getProperties, but this PR is only using .getGeometry right now.

If there is a way to put the properties in the JTS object, that'd be very cool, but I'm not seeing anything like that in the docs for Geometry. My assumption here is that any use of properties attached to a geometry – whether from geojson, stuff in a shapefile dbf, or other columns from a postgis feature – lives entirely outside this library. The only metadata that might stick with a geometry here would be the projection, which doesn't apply to spatial4j or geohash.

I'll look to see if there's something in JTS I'm missing, but I figured that adding properties in would require a whole complicated hash-map-based schema for recreating all possible geojson/postgis/shapefile/whatever else data, which seems complicated.

willcohen commented 6 years ago

Test added. I still am not seeing anything with respect to metadata or properties. If it comes in as a Feature, it goes out as a Geometry. If it comes in as a FeatureCollection, it can only get mapped out as a set of geometries that are converted individually. @worace

willcohen commented 6 years ago

After trying to test some of this functionality more fully with real-world usage, it's becoming a little clearer that fully dropping the properties might actually be a missed opportunity, especially to be able to go between geojson and other spatial formats. I think the default functionality of the read and write geojsons should still be as currently represented in the PR – to default to clean interoperability with the rest of the library – but that there should be some kind of optional options map argument to allow for understanding of properties.

There might be three optional flags:

By default, only geometries will be returned. Enabling a "keep properties" flag will instead return hashmaps with :geometry and :properties keys.
By default, collections will be returned as individual geometries/features.
- If properties are not kept, this would be represented as a vector of geometries. If properties are kept, this would be represented as a vector of these feature hashmaps described above.
- When a "collection" flag is enabled, if properties are dropped, then a single JTS GeometryCollection is returned. If properties are kept, then a single hashmap with a GeometryCollection for :geometry and a vector of properties hashmaps for :properties is returned.
When a "property keywords" flag is enabled, then the properties hashmaps would use keywords instead of strings (:name versus "name")

Similarly, there should be ways to take either plain geometries or these feature hashmaps and be able to pass configuration arguments to to-geojson to choose to export Geometry, Feature, GeometryCollection, or FeatureCollection geojsons. This will require a little more thought, and I suspect it'll become clearer after I work through the read-geojson changes.

willcohen commented 6 years ago

Added that in. Trying to write properties via jts2geojson looks to be pretty complicated. It might be okay to leave it to only be able to read properties for now and tackle writing later, which is a more important use case.

willcohen commented 6 years ago

Would you like to tackle this for 2.0 as well? Should I modify to rebase off of proj4j?

worace commented 6 years ago

Yes I am hoping to include it. Do you think it will conflict? If so a rebase would be helpful.

willcohen commented 6 years ago

It does, but nothing too crazy. I'll rebase on top of #10 and try to keep it up-to-date if anything else changes there, under the assumption that the proj4j branch gets merged first -- you may want to think through if the current method of handling properties seems right to you.

willcohen commented 6 years ago

Rebased. The primary changes involved moving all the new io tests to also use resource files.

willcohen commented 6 years ago

While #17 is more focused on wkb, it may actually make sense for read-geojson to have the same multiple arity function being discussed there. While best practices involve having geojsons use wgs84, it has been acceptable in the past to have them use projected data, and it's probably a good idea to allow setting an optional SRID for anything that is already projected.

willcohen commented 6 years ago

Rebased to 2.0.0-rc-1. I added a :srid flag to read-geojson to line up with the changes to #17. Note that jts2geojson doesn't currently have any understanding of SRIDs, which means that the previous treatment of adding a srid argument to GeoJSONReader doesn't work. Instead, this wraps the various generated geometries with a (jts/set-srid) defaulting to 4326. @chen-factual

worace commented 6 years ago

@willcohen sorry for the delay on this but I finally got some time to go through it more closely.

I think you've done a great job handling all of these different GeoJSON cases and adding a bunch of flexibility to this API. My main reaction is that I think handling all of these different customizations gets pretty complicated, and I'm not convinced it's totally necessary. I think having a function like this that takes so many different types of inputs and conditionally returns different types of outputs can make for a tricky API. And while we could potentially split these apart more, that would leave us requiring users to figure out which function to call based on what type of geojson entity they have.

So I'd like to propose a more restrictive but simpler interface that wouldn't handle quite so many different input types and options.

For one thing I think we can drop the keywords? option and just make this the default. Or we could make string keys the default if you think that's preferable. Users can stringify them themselves if needed.

The last 2 options deal with whether to treat something as a feature or not, and whether to treat it as a collection or not. I don't feel a great obligation to handle all of these possible cases for the user, so I'm wondering if we can get away with just standardizing on the most maximal combination, which in my mind is to just return everything as a collection of Features. This handles the FeatureCollection case by default, and the other 2 cases (Feature and Geometry) can be coerced into it easily.

So we'd have something like:

FeatureCollection -> Unpack to list of features (each with a properties map and a jts Geom Object)
Feature -> Return it as a list of one feature
Geometry -> Wrap it with an empty properties map and return it as a list of one

The downside I see is this potentially feels a little strange for some use-cases (e.g. passing a geometry and getting back a feature). But the additional wrapper structures can be managed easily (just pull out the :geometry key), and seems like it might give the most flexibility without a ton of fancy processing on our end.

Here's a draft of what I put together as I was trying out this idea: https://github.com/willcohen/geo/compare/geojson...worace:hw/gj-simplification. Would love to know what you think, or if there are compelling reasons I have missed for supporting the more varied interface.

willcohen commented 6 years ago

@worace Thanks for looking at this! Generally speaking, I think the idea of simplifying this and doing less seems like a good move, and this proposal looks like the most straightforward way to do it.

I think keyword-style keys rather than stringified are certainly better idiomatically and are what I'd pick as the default, all else being equal -- that said, if speed is the name of the game (and #18 makes me think that it may well be), then jts2geojson by default spits out strings for the names and doing less manipulation on what it returns in the properties might be a reason to go with strings, especially if we're removing options. Is it a problem to have a hashmap that returns :geometry and :properties but then within properties uses string keys?

Simplifying FeatureCollection and Feature like you suggested seem like the right move, though I want to think a little more about adding in a null properties for Geometry by default. In the wild I personally generally only come across Features or FeatureCollections as shapefile etc. replacements, but I could imagine that if people are getting some streamlined geometry-only geojson constructions (which I occasionally have with APIs returning various shapes), injecting properties might seem counter-intuitive and misrepresent what's being sent to them, especially if the user is supposed to be attaching other properties to that shape from elsewhere in an API response. It does still feel reasonable to me that if the geojson is Geometry only, then the library could want to return a JTS geometry only.

I agree that the collection? option may not be particularly important, but I want to look at some other usages of libraries that people might be interoperating with to see if they ingest stuff row by row or in two chunks of geometries and properties (or if they have the option for both). If things expect that two chunk model, it might be worth keeping in.

I'll do a little homework over the next day or two and assuming I don't come up with anything which seems like it conflicts with these changes I'll merge in the proposed simplifications, if that's okay!

willcohen commented 6 years ago

Separately, I don't know why I didn't think of using a protocol. Much simpler.

worace commented 6 years ago

Sounds great.

Keywords seem good to me. I'm not too worried about the overhead of keyword-izing property keys.

I could imagine that if people are getting some streamlined geometry-only geojson constructions (which I occasionally have with APIs returning various shapes), injecting properties might seem counter-intuitive and misrepresent what's being sent to them

This is fair. Maybe we could also include a read-geojson-geometry method which people could use if they knew they would only be passing in geometries (as opposed to the more general functions which are one-size-fits-all)

I want to look at some other usages of libraries that people might be interoperating with to see if they ingest stuff row by row or in two chunks of geometries and properties (or if they have the option for both)

Sounds good will be curious to see what you come up with. Most of the ones I have seen are either Geometry-only, like the JTS Implementation, or use some kind of GeoJSON base class to conditionally return either a Geometry, Feature, or FeatureCollection, as is the case with this library we're using. This is great from a type-safety perspective but IMO it's a little clunky since users have to then do some kind of matching to determine which kind they got back.

willcohen commented 6 years ago

While some other libraries do use GeometryCollections separately from properties (Geotools' shapefile read/write, with the dbf handled elsewhere), I am now persuaded that the handling here just should be simple and predictable. Any user needing this kind of interop can deal with those transformations as needed on their own. I added a read-geojson-geometry like you suggested, and I also added a geometry-collection construction function to geo.jts, kind of like coordinate-sequence, to help enable those transformation operations when needed.

Separately, I realized that now that we have to-jts, these io operations can run on all shapelikes and not just JTS geometries -- we should be able to make a wkt or geojson or wkb from a geohash as well! -- so I changed those functions to accept shapelike.

worace commented 6 years ago

@willcohen sounds great, thanks for doing that research. This is looking pretty good I think. I'm ready to merge it unless there are any more changes you'd like to see first.

willcohen commented 6 years ago

I think it’s ready to merge too!

Factual / geo

Allow for parsing of geojson Features and FeatureCollections #12