Geometry transformations

willcohen commented 6 years ago

For #8, an implementation:

Implemented

Add proj4j dependency
Add awareness of SRIDs to geo.jts construction functions
Add helper functions to determine the number of dimensions in a geom, to enable the creation of EWKBs that are SRID-aware
Add read-wkb-hexand to-wkb-hex functions that can read/write hex string version of WKB
Add to-ewkb, and to-ewkb-hex functions that can write binary and hex string objects with SRIDs
Add transformation of JTS geoms and coords between SRIDs, CRS names, and proj4 strings
Modification to Shapelike protocol to integrate transformations and conversions to JTS.
Check that all functions accepting Shapelike objects are properly aware of projections when necessary.
- resegment has been changed to first transform its linestring to WGS84, and then transforms all the output linestrings back to the original projection.
Check that all functions accepting Point objects are also aware of projections, since some JTS Points may be using different projections now
- distance operates on geohash points, so those should already be projection-aware now
Add tests

willcohen commented 6 years ago

@worace After thinking about how to extend this to protocols a little more, I'm also thinking that org.locationtech.jts.geom.Coordinate should be removed from the Point protocol. geohash's WGS84Point, JTS Point (with its SRID, which is always WGS84 before this pull request), and spatial4j Point all refer to unambiguous locations on the earth. However, Coordinate is purely cartesian. This distinction is what keeps the logic used to actually do these projection transformations consistent. For a given geometry, each coordinate within that geometry is individually transformed from the geometry's existing spatial reference system to the target one. Once that's complete for all coordinates, the geometry's spatial reference system identifier is updated and that new geometry is returned.

Adding (to-srid [srid]) to the Point protocol (and, additionally, Shapelike) should be straightforward for everything but coordinate, since we can just call something like (transform-geom (to-jts [this]) srid) and get back a projected JTS object. But for Coordinate, we'll never know what projection it should be coming from. Casting it to WGS84 seems like the wrong move, since that Coordinate might actually just be a cartesian representation of a non-WGS84 projection and would then be silently and incorrectly transforming an object back to the earth.

worace commented 6 years ago

I'm also thinking that org.locationtech.jts.geom.Coordinate should be removed from the Point protocol.

OK after reading through this and doing a bit more background reading I do see your point and am inclined to agree at least conceptually. geo.spatial is intended to deal with geodesic stuff and things that can be dealt with in earth units like meters etc, so Cartesian coords don't necessarily fit cleanly in that model.

On the other hand part this library's intent is to provide an easy to use "Do What I Mean" interface for some of these operations. In that context some users just want to assume everything is an earth point in the most generic off-the-shelf projection, and in the case of a coordinate just dumping it into WGS84 does a decent job of this.

So, if we are looking at taking this out, I'd want to make sure we still include a decent escape hatch for the "I just have a lat/lon and don't really know or care about the projection...please figure it out for me" use-case.

There is also a bit of compatibility concern where there are places in the library where we are already using coordinates in this way so it would take a bit of effort to migrate these (but definitely doable if we decide it is worth while).

So, thinking through these options a bit more it seems like we either:

Remove org.locationtech.jts.geom.Coordinate from the Point protocol. This maintains the conceptual purity of this abstraction, and saves us from "lying" to users by accepting their cartesian data as if it were Geodesic WGS84/4326 data. It introduces a bit of headache around compatibility and maybe more user friction since people who might have previously not thought about projections at all now have to think about it.
Leave it as is and just assume a Coordinate in this context represents a lat/lon encoded in the most generic SRID there is. This makes the interface more complete by allowing us to accept another type of point representation, but does have the potential to screw things up if people try to do spatial ops on Coordinates representing non-4326 data. It also saves us from compatibility issues, both internally within the library code and for users who may have been depending on this behavior.

Reflecting on these a bit more I guess I'm not sure I see what's so bad about 2, but I would be interested to hear your thoughts on the matter. Do you think there is a high chance of people getting tripped up because we mis-converted Coordinates for them?

As one example you can currently use geo.spatial/distance to measure distance between 2 Coordinates, and it will just convert it to WGS84:

(geo.spatial/distance (org.locationtech.jts.geom.Coordinate. -70.0024 30.0019)
                      (org.locationtech.jts.geom.Coordinate. -70.0025 30.0020))
=> 14.696091061340983

And PostGIS will do similarly:

select ST_Distance(ST_Point(-70.0024, 30.0019)::geography, ST_Point(-70.0025, 30.0020)::geography);
 st_distance 
-------------
 14.69609116
(1 row)

It seems like this is probably the most expected behavior for the majority of use-cases, even though it's technically making an unverifiable assumption about the source of the provided coordinates.

willcohen commented 6 years ago

Thanks for looking at this, @worace!

I absolutely agree that the library needs the ability to just default to lat/long when possible, and that users shouldn't have to think about projections when they don't need to. That said, I also think that functionality is exactly what JTS Point is equipped to deal with. That's how it's worked so far in geo for points, with a factory that always uses 4326 for points it generates. This means that when the objects get passed to other things using JTS, they know how to handle them and everything works well.

The example you give – using geo.spatial/distance on two coordinates – is an example of something that I don't think should actually be possible. The purpose of those coordinate objects isn't to be spatially referenced on the earth and then used for calculations, but instead to be internal objects that build up into things like points or polygons that are themselves spatially referenced (or, for example, to be transformed one by one within a polygon using a proj4j transformation).

If the goal of the library is to be "do what I mean," then I'd suggest that it shouldn't allow the use of internal-ish things and pretend that they're something else. If a user has two point-like objects and wants to think about doing operations on them, let's make sure they use a point, and let's default it to lat/lng. So, calling spatial/distance on two points should continue to be very possible – if they use a point.

The way I dealt with this in the pull request is to say that when calling (point x y), continue to default to WGS84, and only put a different projection in with a 3-argument (point x y srid) or 2-argument (point coord srid) where the first one is an (unprojected) JTS coordinate. That coord argument as an input still shows the importance of having a function like (coordinate x y) available that makes a JTS coordinate, even if that object type isn't elevated to act the same as a spatially referenced point right off the bat via the protocol.

In the PostGIS example, remember that you're still actually having to specify the projection system with ::geography. Removing the cast to geography would make it closer to the role of what Coordinate is for, with PostGIS reporting cartesian distance between the coordinates, in this case using a slightly-crazy unit of degrees.

Let me know what you think. Is there a reason that you think a user should want to be doing all these operations on a coordinate instead of a point?

willcohen commented 6 years ago

For things like spatial/distance which take points as arguments, if the worry is breakage for existing users, maybe we could still remove Coordinate from the protocol and then for each of those functions:

Check if the function is being passed two coordinates instead of points.
If so, warn the user that they should be using points instead, and manually cast them into points before continuing the operation.

With distance, if it took points as arguments, it'd actually mean that it could also become projection-aware without having to worry about it. The call to make a geohash point would always need to reproject it back to lat/lng. In cases where the functions just take things that are in the point protocol, things should generally continue to work okay.

willcohen commented 6 years ago

Sorry @worace – last comment. I keep rethinking this!

Another option to minimize breakage would be to keep it in the protocol, except deprecate it and put warnings there instead of in all the other functions, which could get verbose. This would allow it to keep working for users who expect that behavior, while still putting it on a path to be removed.

That said, if we're talking about bumping it to 2.0.0 anyway because the API is changing already, it seems like a not-unreasonable time to remove a logical inconsistency.

worace commented 6 years ago

The example you give – using geo.spatial/distance on two coordinates – is an example of something that I don't think should actually be possible.

Yeah I agree with you that it is technically incorrect, but in practice when someone asks for that it's almost certain that they mean "treat this as a lat/lon", which is what we do.

It wouldn't be a huge burden to require users to turn stuff into points first (this is likely what we'd end up doing), but would be a change to several parts of the library.

Is there a reason that you think a user should want to be doing all these operations on a coordinate instead of a point?

Maybe not explicitly creating a Coord and doing distance stuff with it, but more likely getting Coordinate objects produced by other JTS operations (or even other functions within this library). I know I have run into this myself for example when trying to do things like this.

We can obviously move to a world where we require you to first call (point coord srid) to translate it (it sounds like this is what we may end up doing). But it would be a change to several parts of the library.

Partly it's a question of what's the goal for supporting projections -- is it sufficient to have more-or-less bolt-on support for taking existing points or shapes (assumed to be WGS84) and translating them into other projections? If so that seems fairly straightforward and what you have implemented seems to get most of the way there.

Or, is it necessary to integrate awareness of projections into the rest of the library, such that if I ask for the distance of one point in CRS 4326 and another in CRS 26911, it recognizes that and negotiates a common reference between them?

I'd have to study up more on the ins and outs of projection handling in Spatial4J and Geohash.java but this seems fairly involved to me. Is it sufficient to just translate everything back to WGS84 when doing spatial operations? Even if that works, it seems like it could be a performance penalty for what would likely be a relatively small portion of usage. If you have done experiments into implementing this kind of translation I'd be curious to know what you came up with. Maybe it's not as bad as I am imagining.

willcohen commented 6 years ago

Thanks for the reply! Here’s some background:

Motivation

Much of our current spatial analysis, though enabled via Clojure, is highly dependent on PostGIS. Most municipal or state-level GIS data comes in in some kind of state plane projection, and there’s definitely value in doing calculations in those projections — the area of a feature, in a ftUS state plane, gives square feet, which makes sense for a building or property. I don't know of any good clojure libraries that are aware of dealing with projected JTS objects.
It would reduce the burden on a backend database and allow Java to take on more of the processing of these datasets if the objects don’t always have to be converted to lat-long with a round-trip to PostGIS before entering Clojure-world. JTS and proj4j do enable this in Java, but to make this simple in Clojure such functionality either needs to get added into geo or implemented in some new wrapper of a wrapper library.
We have a fair number of cases where we want to analyze this projected data relative to geohashes (and spatial4j, to the degree that this library already compares between the two): which geohash does this building or set of buildings fit in? What geohashes at a given precision cover a city’s boundaries? What are that geohash’s boundaries and area, and can we get that shape reprojected back into the original state plane for comparison to the original data? Doing this requires some kind of analysis that relates these non-JTS objects to projected JTS objects.

Adding to geo versus enabling elsewhere

This library is the best wrapper of geospatial libraries I know of already, and it’s very handy that it translates between the three libraries. We've been using an internal fork of it since 2016, but moved back to using your direct version after last year's updates (with the jts 1.15 update also re-applied internally up until now). So, above all else, thanks for creating it in the first place. We find a lot of value in it!
The fact that it doesn’t actually handle the SRID of valid JTS objects means that a user with projected data still has to pre-solve that on their own somehow, which currently means wrapping this library with other logic. It’s only an issue with non-lat-long data, but lots of GIS info is that.
Geotools is a Java library that enables reprojection of JTS, but there’s a lot of boilerplate to make it happen, it needs a lot of dependencies, and it’s LGPL. proj4j can create the transformation, but applying it to all of a Geometry’s coordinates still means dropping down to Java and JTS, which is basically what these proposed transform functions take care of.
The only way I can think of to add this externally would be to create a full fork instead of a simple wrapper, because adding functions to the existing Point/Shapelike protocols that are projection-aware can’t be done with extend-protocol. If you guys think it’s cleaner to keep this library lat/long only and avoid the possible overhead, especially as it relates to your purposes, and not dig into this mess of GIS and GIS projections, that does make a lot of sense too. I just didn't want to pre-emptively fork a library we already find very valuable!

Goals for a change:

All valid JTS geometries should work without having to think about what projection they're coming in as. If the projection isn't set, just call it lat-lng.
Transforming a JTS geometry's projection should be as easy as converting it to a spatial4j or geohash object. This means adding some sort of (to-jts) with optional srid arguments into the actual protocols.
Any implementation should minimize impact to objects that are already in lat/lng, and a good assumption for objects being created is that unless specified otherwise, the projection always defaults to lat/lng.

Ancillary benefits:

Transforming any java geospatial object into a projected JTS object and vice-versa could be fairly seamless.
This opens the door to better interop with things like PostGIS or shapefiles. Writing out a projected JTS object as EWKB should mean it could be a thing which is directly sent back to a database with a table in any projection via JDBC, versus turning it into a thing that gets read in via ST_AsText or some-such. I'm not sure of the best way to read shapefiles in and that read/write might want to live outside of geo or in a dependency or something, but that would also need to deal with data in different projections. Being able to avoid needing all of Geotools for this eventually would be quite nice.

Overall options: After playing with the protocols some more, I realize there’s actually possibly two potential paths here, at a high level.

One is to try to add something like (to-jts [this] [this srid]) exclusively to the Shapelike protocol (in other words, ignoring adding projections to #7). In some ways, this avoids some of the confusion discussed in previous comments. Assuming protocols can be inherited – which I need to read up on – this would sidestep the coordinate debate, since those aren’t actually subclasses of geometries. The catch to this approach is that this doesn’t on its own allow for the conversion of, say, a geohash WGS84Point to a projected JTS Point in one step, which is actually the very thing that led me down this path in the first place last week. We’d have to do something like (to-jts (to-jts-point geohash-point) srid), which feels a little redundant. In trying this path out, I’m actually having a hard time getting something defined as a Point to be able to take arguments set up in the protocol for Geometry, so I’m not sure it’s actually possible, but it is worth considering.
A second option is to extend it to both Point and Shapelike, as I’ve been assuming so far. This would mean that you could run something like (to-jts-point geohash-point srid) or (to-jts-geom shape-like srid), and in both cases get back a projected JTS object in one step, and excluding the srid would put it in WGS84. The catch here is that now we’d have to make a decision around Coordinates and how to handle them, and think about the functions that currently operate under the assumption that everything is in lat/lng.

No matter what, I do think that most of the projection handling can lie in the actual protocols, where they convert between different types. The main times the transformations should come into play is if the object is a JTS object in a non-standard projection, or a user wants to turn an object into one with a different projection. Here's a potential version of such a protocol for Point, also with an plan of maybe leaving Coordinate in but marking that it shouldn't be used this way:

(defprotocol Point
  (latitude [this])
  (longitude [this])
  (to-spatial4j-point [this])
  (to-geohash-point [this])
  (to-jts-point [this] [this srid]))

(defn deprecate-coordinate
  [f]
  (log/warn "Deprecated: Assuming that this coordinate uses lat/lng values. Migrate to use a JTS Point instead.")
  f)

(extend-protocol Point
  WGS84Point
  (latitude [this] (.getLatitude this))
  (longitude [this] (.getLongitude this))
  (to-spatial4j-point [this] (spatial4j-point this))
  (to-geohash-point [this] this)
  (to-jts-point ([this] (jts-point this))
                ([this srid] (jts/transform-geom (jts-point this) srid)))

  org.locationtech.jts.geom.Point
  (latitude [this] (.getY (jts/transform-geom this 4326)))
  (longitude [this] (.getX (jts/transform-geom this 4326)))
  (to-spatial4j-point [this] (spatial4j-point this))
  (to-geohash-point [this] (geohash-point this))
  (to-jts-point ([this] this)
                ([this srid] (jts/transform-geom this srid)))

  org.locationtech.jts.geom.Coordinate
  (latitude [this] (deprecate-coordinate (.y this)))
  (longitude [this] (deprecate-coordinate (.x this)))
  (to-spatial4j-point [this] (deprecate-coordinate (spatial4j-point this)))
  (to-geohash-point [this] (deprecate-coordinate (geohash-point this)))
  (to-jts-point ([this] (deprecate-coordinate (jts-point this)))
                ([this srid] (deprecate-coordinate (jts/transform-geom (jts-point this) srid))))

  org.locationtech.spatial4j.shape.Point
  (latitude [this] (.getY this))
  (longitude [this] (.getX this))
  (to-spatial4j-point [this] this)
  (to-geohash-point [this] (geohash-point this))
  (to-jts-point ([this] (jts-point this))
                ([this srid] (jts/transform-geom (jts-point this) srid))))

and for Shapelike, something like:

(defprotocol Shapelike
  (^Shape to-shape [this] "Convert anything to a Shape.")
  (^Geometry to-jts-geom [this] "Convert anything to a JTS Geometry."
                         [this srid] "Convert anything to a projected JTS Geometry."))

(extend-protocol Shapelike
  Shape
  (to-shape [this] this)
  (to-jts-geom ([this] (.getGeom this))
               ([this srid] (jts/transform-geom (.getGeom this) srid)))

  Geometry
  (to-shape [this]
    ;; Cloning geometries that cross dateline to workaround
    ;; spatial4j / jts conversion issue: https://github.com/locationtech/spatial4j/issues/150
    (let [geom (if (crosses-dateline? this)
                   (.clone this)
                   this)
          dateline-180-check? true
          allow-multi-overlap? true]
      (.makeShape jts-earth geom dateline-180-check? allow-multi-overlap?)))
  (to-jts-geom ([this] this)
               ([this srid] (jts/transform-geom this srid))))

For the functions that currently operate on JTS objects, some additional checking probably needs to happens: in resegment, there probably will need to be a step in there where the LineString is converted to WGS84 if need be. Its internal helpers like under-cap-with-next-point? can reasonably expect to only be called with things that continue to be WGS84 coordinates, so it's probably fine to ignore projections there. Things like point-between should be able to have coords switched out for Points (casting coords to points if needed to reduce breakage). Since it relies on distance, which itself relies on to-geohash-point, if the protocol is aware of projections, things should be okay.

I'm less sure about what to suggest for the functions in poly.clj. On the one hand, many of those operations seem projection-agnostic, so maybe its nice to have them work in coordinate systems. On the other hand, I don't think they're directly related to much happening in JTS. Perhaps they stay lat/lng focused for now and then additional wrappers around actual JTS functionality like union and intersection eventually get added in too? I suppose when actually testing all this out that might need to get revisited.

Let me know what you think!

worace commented 6 years ago

This is helpful context, thanks for sharing. To be clear I'm definitely interested in supporting these projection features in some way in this library. Just a question of what is the cleanest way to do it I think.

Much of our current spatial analysis, though enabled via Clojure, is highly dependent on PostGIS. Most municipal or state-level GIS data comes in in some kind of state plane projection, and there’s definitely value in doing calculations in those projections.

I'm curious how you transfer this information between PostGIS and something like this library. Do you just use WKTs that have the SRID encoded in them? Does JTS' wkt reader ingest this properly and generate a geometry in the appropriate projection? If you were trying to load that data into this library, would you be able to use the existing geo.io/from-wkt function or would you have to do some additional pre-processing on it first?

we want to analyze this projected data relative to geohashes (and spatial4j, to the degree that this library already compares between the two): which geohash does this building or set of buildings fit in? What geohashes at a given precision cover a city’s boundaries?

If you want to do this currently I assume you have to first translate your non-WGS84 JTS geometry into WGS84 before passing it into to-shape? If we wanted to integrate this smoothly I guess we would need to incorporate an SRID check into our to-shape code and do the translation first if necessary?

One is to try to add something like (to-jts [this] [this srid]) exclusively to the Shapelike protocol (in other words, ignoring adding projections to #7).

This is easy to do and since it would be a new, standalone addition to the library there's not much concern over compatibility with existing stuff, so we should almost certainly do it. I think the question of whether it needs to go into the Shapelike protocol is actually a little more nuanced. The main benefit of the protocol is allowing us to do automatic conversions on users' behalf (for example someone wants area of a geohash and we can . If (to-jts [this] [this srid]) is going to be used in this way then adding it to the protocol probably makes sense. But if it's something users would be calling directly from their own code then it could also just be a new standalone function we add somewhere.

I think I'm still mostly curious about whether we'd need to do translation of some geometries when converting them to shapes... e.g. if someone asks for area of a JTS geom that's not in WGS84, would we need to do some check here:

(extend-protocol Shapelike
  Shape
  (to-shape [this] this)

  Geometry
  (to-shape [this]
    ;; Cloning geometries that cross dateline to workaround
    ;; spatial4j / jts conversion issue: https://github.com/locationtech/spatial4j/issues/150
    (let [geom (if (not-wgs84? geom) (jts/transform-geom geom 4326) geom)
          geom (if (crosses-dateline? this)
                           (.clone this)
                           this)
          dateline-180-check? true
          allow-multi-overlap? true]
      (.makeShape jts-earth geom dateline-180-check? allow-multi-overlap?))))

Do you think that is how you would expect it to behave? or you would expect to do that conversion on your own before passing it into functions that will require conversion to shape?

willcohen commented 6 years ago

Great!

For now, since everything is lat long here, I'm just using WKTs and transforming things into and out of 4326 in PostGIS. I think WKT is always projection-less, so on postgis it's something like ST_SetSRID(ST_GeomFromText(wkt), 4326), and potentially wrapping that in another ST_Transform(). PostGIS has a SRID-aware extension of WKB called EWKB, which JTS can write and read.

I have another project deadline the next few days, but unless you're itching to get this 2.0 release out immediately, it'd be great for me to do a little homework with respect to all this. If you're okay with waiting on 2.0 until projections get figured out, I agree that it makes sense to think it through into full use cases like this.

I'm pretty sure PostGIS returns an EWKB or something close to it when asking for a geometry – try SELECT geom, ST_AsEWKB(geom) FROM exampletable, unless it's just making them appear similar for display for an end user – so as long as JDBC returns something similar, as long as that can eventually get into an object type that the JTS WKBReader can work with, I think it would be able to read them directly-ish. There'd probably be a little bit of change to geo.io to handle projections too, but nothing big.

Lastly, you're right that to-shape needs a few checks here and there, and that's a good catch. I forgot to add in the previous comment that the conversion functions would need a few modifications to handling their inputs as well, in addition to the actual protocol changes either way.

Thanks!

worace commented 6 years ago

I guess they also have a WKT version which includes SRID which they call EWKT, but no idea if this is a widely supported format or not: http://www.postgis.net/docs/ST_AsEWKT.html

I'm not in any particular rush to get out the 2.0 release so I'm happy to come back to it next week if you like. If I can find some time I will do a bit of research into reading EWKB / EWKT from JTS as well

willcohen commented 6 years ago

@worace I updated the PR to operate on Shapelike instead of Point. This totally sidesteps the whole Coordinate issue and seems to work well, since Geometry trickles down to JTS's Point. I also added WGS84Point to the extended Shapelike in geohash.clj, so I think that covers pretty much everything. The spatial4j Circle, I don't think, can be directly cast to a JTS geometry, so I think we only need to cover Rectangle and Point as seen in the PR.

I still haven't dug deeper into the EWKB issue but if this approach to transformations looks good, I can start thinking about that now as well.

willcohen commented 6 years ago

@worace Looks like PostGIS could integrate quite nicely. I think some pretty small steps would be sufficient to integrate. The actual database back and forth should definitely live outside this library, and we can add helper functions to get in and out of postgres objects.

Additionally, we could consider extending Shapelike to PGobject and then adding a corresponding (to-pg-geom) or something like it, which would mean that users could query a spatial table and act on what is returned or add/update rows directly, without having to wrap all geometry columns in conversion functions. While a postgres object does straddle the line between being a pure input-output object type like WKT or WKB, the fact that it's actually a java object means it might fit within the protocol's role.

Potential PostGIS approach:

Add [org.postgresql/postgresql] as a dependency, so that the library is aware of native objects.
Add helper functions to geo.io that perform the read and write from/to PGobjects
(Possibly) extend Shapelike using PGobject, so they can easily be shapes and geometries after a query. This, then leads to the question of whether or not to...
(Potentially) add a corresponding (to-pg-geom) and perhaps (to-pg-geog) to Shapelike, to simplify sending objects back to a database. This might be getting too far in terms of making the protocol be about input/output, since it'd be a little like adding (to-geojson), but it is a native object for anyone dealing with JDBC.

PGobject IO

Running a PostGIS query via JDBC that creates a geometry: SELECT ST_SetSRID(ST_GeomFromText('POINT (10 10)')) AS geom, returns an object like this:

#object[org.postgresql.util.PGobject 0x7efa9ea1 "010100000000000000000024400000000000002440"]

We can actually test that these objects are postgis geometries before doing operations on them:

(.getType obj-from-that-query)
=> "geometry"

They can also be converted to hex strings using postgres jdbc:

(.toString obj-from-that-query)
=> "010100000000000000000024400000000000002440"

Unprojected Point

The WKB reader can turn those hex strings to JTS:

(.read wkb-reader (WKBReader/hexToBytes "010100000000000000000024400000000000002440"))
=> #object[org.locationtech.jts.geom.Point 0x515c6a62 "POINT (10 10)"]

A geometry without an SRID returns 0:

(.getSRID (.read wkb-reader (WKBReader/hexToBytes "010100000000000000000024400000000000002440")))
=> 0

Projected Point

With a projected geometry, SELECT ST_SetSRID(ST_GeomFromText('POINT (10 10)'), 3586) AS geom:

#object[org.postgresql.util.PGobject 0x5c78b615 "0101000020020E000000000000000024400000000000002440"]

(.read wkb-reader (WKBReader/hexToBytes "0101000020020E000000000000000024400000000000002440"))
=> #object[org.locationtech.jts.geom.Point 0x2dfb74a4 "POINT (10 10)"]

(.getSRID (.read wkb-reader (WKBReader/hexToBytes "0101000020020E000000000000000024400000000000002440")))
=> 3586

Unprojected Polygon

For an unprojected polygon, SELECT ST_GeomFromText('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))') AS geom:

#object[org.postgresql.util.PGobject
        0x56d5bae3
        "0103000000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000"]

(.read wkb-reader (WKBReader/hexToBytes "0103000000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000"))
=> #object[org.locationtech.jts.geom.Polygon 0x69044e73 "POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))"]

Projected Polygon

And lastly, a projected polygon, SELECT ST_SetSRID(ST_GeomFromText('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'), 3586) AS geom:

#object[org.postgresql.util.PGobject
        0x3c811f80
        "0103000020020E0000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000"]

(.read wkb-reader (WKBReader/hexToBytes "0103000020020E0000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000"))
=> #object[org.locationtech.jts.geom.Polygon 0x67092561 "POLYGON ((0 0, 1 0, 1 1, 0 1, 0 0))"]

(.getSRID (.read wkb-reader (WKBReader/hexToBytes "0103000020020E0000010000000500000000000000000000000000000000000000000000000000F03F0000000000000000000000000000F03F000000000000F03F0000000000000000000000000000F03F00000000000000000000000000000000")))
=> 3586

willcohen commented 6 years ago

I've gone through and added a bunch of tests, both to convert to and from JTS objects, as well as to check that both single points as well as geometries are able to be transformed between projection systems.

I think the best path with respect to PostGIS can be seen in #15, which depends on this. It tables all of the protocol questions for now and just starts with a to-postgres and from-postgres.

I think this PR is looking good to merge. Let me know what else you’d like to see. Happy to rebase off of the other outstanding PRs depending on the order you want to go in.

@worace

worace commented 6 years ago

@willcohen I haven't been able to dig through too carefully but at first glance this looks pretty good! Thanks for all the work that went into it.

Unfortunately I am traveling for the next 10 days or so, so I may be a little slow to respond in more depth. But I think this looks like a good direction and I just want to take some time to read it more closely before merging.

As for the postgres stuff I'd be interested to discuss it further as a separate issue once we have this batch of changes resolved. My initial reaction is that I'm skeptical about adding any direct connections to postgresql or JDBC from this library as I think it's kind of beyond the scope of what we're providing.

But it seems like once we provide the easy interface via EWKB that it should be pretty easy for users to integrate with their own DB code and maybe that solves 90% of the problem for most people anyway.

willcohen commented 6 years ago

Sounds good. I'll think a little more about the postgres stuff too, and I agree that it may well be beyond the scope of this library in the end, especially if we're only talking about two functions. That said I should note that it doesn't actually require clojure.java.jdbc, only the postgres driver for it, which is self-contained.

willcohen commented 6 years ago

I closed #15 in favor of just adding some hex string IO here, which doesn't need any new dependencies. Like you suggest, this does get the conversion 90-ish% to being ready for postgis. I'll try to put this through its paces a little bit and try it out on a real database a little more fully to make sure it works out well and if there's anything else that should happen here without expanding the library's scope too much.

Once all this stuff lands, there might be value in thinking through some kind of 2.0-beta release to allow people to put this through its paces before a major version bump, given the scope of changes since 1.2. @worace

willcohen commented 6 years ago

I made a few changes to extend transformations to allow not only EPSG codes but also proj4 strings in general. Based on that I was able to test this out using JTS geometries read in from a shapefile. There is sufficient functionality to first transform such a JTS object from its arbitrary proj4-based projection (as long as the user is able to identify that on their own) to a specific EPSG SRID for easier PostGIS interoperability, and from there to convert that into the appropriate EWKB hex string, which a user can then on their own figure out how to turn into a postgres object fairly simply.

willcohen commented 6 years ago

@worace are there other changes you'd like to see here?

worace commented 6 years ago

@willcohen sorry I have been slow to get back to this; have been busy catching up after vacation. Will try to look at this more closely in the next couple days.

willcohen commented 6 years ago

No problem, I hope you had a good vacation! Just wanted to keep this moving.

willcohen commented 6 years ago

I made changes in the latest commit, addressing everything but the proj namespace. Since JTS objects do have a place for SRIDs, some information about projections should still want to live there, like get-srid and set-srid. Similarly, I think the only way to remove the circular dependency may be to change transform-coord to drop down to java to make a coordinate, instead of using the (coordinate) function in geo.jts. While we could do that, it does seem like the library should be consuming its own functions whenever possible instead of duplicating functionality. We can still think about it some more, but I'm having a hard time drawing a consistent line of what would be jts and what would be proj.

Additionally, I can't find any built-in functions for checking whether or not a coordinate has a Z value, only what dimensions the geometry itself is -- a point in 3 dimensions and a point in 2 dimensions both have a ".getDimension" of 0. The only reason why this even matters is that the WKBWriter that is SRID-aware requires that we specify how many dimensions the geometry has, and I think that may always require just looking at Z values for each coordinate in the geometry for now.

Lastly, in terms of additional functionality cluttering the namespace, one mitigating factor may just be making as much private as possible. (transform-geom) is the most important function, and I think the helper functions I left public are still useful in general. As long as the proj4j interoperability stays private (making and applying the transforms, mostly), a lot of the new functionality can stay hidden-ish inside the library. More importantly, if proj4j changes its namespace once it's done incubating with locationtech, there won't be a need for a major version change again. I like the idea that for the purposes of this library, the only way "in" to working with projections is to provide SRIDs, projection names, or proj4 strings, rather than exposing all of the projection machinery and transformation objects.

@worace

willcohen commented 6 years ago

This pulls in the proj4j-review branch changes, removing 3d etc. I fixed up the tests from that -- the main issue was that I think point needs to have arities [coordinate], [lat long] (instead of [x y]), and [x y srid] to be consistent with the other functions and for the protocols to work cleanly. Besides that, I changed Shapelike to operate on all the specific object types instead of Shape, added tests about the CRS names and exception throwing, removed the 0 check for same-srid?, and noted that in the doc strings for that and same-geom?.

Based on the Monday comments, I think the only thing unaddressed is the idea about the potential srid? function based on what's in transform-geom? It's not functionality that's appearing regularly so I think it's okay to leave transform-geom the way it is for now. @worace

willcohen commented 6 years ago

Realizing I missed a final line ending in gitignore, just pushed that in as well today. Do you have additional questions about the handling around SRIDs/EPSGs/proj4 names/proj4 strings?

@worace

worace commented 6 years ago

@willcohen no, i think it's looking pretty good; Thanks for working on that last round of updates. Sorry I have been silent on this but I think I will merge it in the next few days. Then I'll probably work on getting a couple more things merged and doing a bit of refactoring and extra testing in preparation for a 2.0 alpha release of some sort.

willcohen commented 6 years ago

Great! Let me know if you need anything else on this one.

Factual / geo