Closed willcohen closed 5 years ago
@worace any thoughts on this PR?
Makes sense. I think I'd need to add Integer
as well since transform-geom
lets us use EPSG codes as integers as well. It shouldn't be too complicated but this might mean re-working the logic of transform-geom to deal with the int/string split a little earlier on. I'll try to look at this over the next few days.
I'm not actually sure what the overhead of the instance?
is relative to these actual operations are, so I have no idea what the performance improvement will end up being, but it doesn't seem like it'd hurt anything in any case.
@worace I've been working through this a few different ways and am still not sure if it'll end up having enough of a performance increase to be worth the added complexity, but I still like the idea. Either way, I think it should hold off until after #44, since this PR would actually make crs
's create-transform
much more commonly used from a public API perspective, and that's the only part of the existing API that actually changes with that dependency change.
@willcohen got it. well thanks to looking into it. I'm happy with this way if the other approach is going to be a huge pain. I'll check out that other PR first though.
@worace There's now a Transformable
protocol in the crs
namespace. Certainly open to changing the name. The first upside to the protocol is that transformations can work from origin-target where either is a Long
, Integer
, String
, and now CoordinateReferenceSystem
, which is an improvement.
There's also a more consistent way of trying to deduce the SRID for any of those 4 objects, which should increase the number of cases where the SRID can be correctly set at the end. (See the added number of cases for get-srid
in t_jts
. There's currently now both a jts/get-srid
, which works on Geometry
, and a crs/get-srid
, which works on Transformable. The overlap doesn't seem problematic to me, but let me know if you'd prefer a different name.)
I also rebased this pre-emptively on #44, assuming that will land first.
The one remaining consideration is that this still currently has an instance?
and satisfies?
check on transform-geom
. This is because the two-arity has a little ambiguity. I think we want to enable two things -- first, if t
is a Transform itself, just apply that; second, if t is a Transformable target, then we should try to transform the geom from its current CRS to the new one. The only way to avoid that comparison is to try to add CoordinateTransform
to our the new Transformable
protocol, but that feels a little strange to me. Either way, the previous logic of all of the branches in transform-geom
now lives in create-transform
.
Unscientific benchmark updates:
With this new protocol, the creation of transforms and application to geoms is similar but now actually a little slower -- I'm not sure, but this overhead may be due to the fact that there's now more functionality available in what can be passed to that transformation and how well it interprets and sets the target SRID via crs/get-srid
, which seems like an acceptable tradeoff to me. If you have suggestions for how to make this more efficient, I'd be happy to give it a shot!
Before protocol: 8ms
(bench
(crs/create-transform 4326 23031))
Evaluation count : 7440 in 60 samples of 124 calls.
Execution time mean : 8.088982 ms
Execution time std-deviation : 122.779458 µs
Execution time lower quantile : 7.890932 ms ( 2.5%)
Execution time upper quantile : 8.374483 ms (97.5%)
Overhead used : 7.551538 ns
Found 2 outliers in 60 samples (3.3333 %)
low-severe 2 (3.3333 %)
Variance from outliers : 1.6389 % Variance is slightly inflated by outliers
After protocol: 8.5ms
(bench
(crs/create-transform 4326 23031))
Evaluation count : 7080 in 60 samples of 118 calls.
Execution time mean : 8.557325 ms
Execution time std-deviation : 168.217638 µs
Execution time lower quantile : 8.285679 ms ( 2.5%)
Execution time upper quantile : 8.885056 ms (97.5%)
Overhead used : 7.790776 ns
Before protocol: 475 ns
(let [t (crs/create-transform 4326 23031)]
(bench (transform-geom (point 3.8142776 51.285914) t)))
Evaluation count : 125127960 in 60 samples of 2085466 calls.
Execution time mean : 479.239794 ns
Execution time std-deviation : 7.021314 ns
Execution time lower quantile : 468.222229 ns ( 2.5%)
Execution time upper quantile : 492.111577 ns (97.5%)
Overhead used : 7.551538 ns
After protocol: 495 ns
(let [t (crs/create-transform 4326 23031)]
(bench (transform-geom (point 3.8142776 51.285914) t)))
Evaluation count : 123639060 in 60 samples of 2060651 calls.
Execution time mean : 495.077556 ns
Execution time std-deviation : 9.900904 ns
Execution time lower quantile : 475.391011 ns ( 2.5%)
Execution time upper quantile : 509.360320 ns (97.5%)
Overhead used : 7.790776 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 7.8812 % Variance is slightly inflated by outliers
Before protocol: 20 microseconds
(let [t (crs/create-transform 4326 23031)]
(bench (transform-geom (multi-polygon-wkt [[[10 10, 80 10, 80 80, 10 80, 10 10],
[20 20, 20 30, 30 30, 30 20, 20 20],
[40 20, 40 30, 50 30, 50 20, 40 20]]]) t)))
Evaluation count : 2957760 in 60 samples of 49296 calls.
Execution time mean : 19.682421 µs
Execution time std-deviation : 445.235498 ns
Execution time lower quantile : 19.016432 µs ( 2.5%)
Execution time upper quantile : 20.577034 µs (97.5%)
Overhead used : 7.551538 ns
Found 1 outliers in 60 samples (1.6667 %)
low-severe 1 (1.6667 %)
Variance from outliers : 10.9740 % Variance is moderately inflated by outliers
After protocol: 21 microseconds
(let [t (crs/create-transform 4326 23031)]
(bench (transform-geom (multi-polygon-wkt [[[10 10, 80 10, 80 80, 10 80, 10 10],
[20 20, 20 30, 30 30, 30 20, 20 20],
[40 20, 40 30, 50 30, 50 20, 40 20]]]) t)))
Evaluation count : 2998200 in 60 samples of 49970 calls.
Execution time mean : 20.966107 µs
Execution time std-deviation : 377.768999 ns
Execution time lower quantile : 20.049906 µs ( 2.5%)
Execution time upper quantile : 21.608664 µs (97.5%)
Overhead used : 7.790776 ns
Found 3 outliers in 60 samples (5.0000 %)
low-severe 3 (5.0000 %)
Variance from outliers : 7.7763 % Variance is slightly inflated by outliers
@worace I think this is looking better now -- I'd forgotten that proj4j can use .getSourceCRS
to try to pull CRS identifiers out of CoordinateTransform
s, which means that the logic for checking what SRID to set can be made a little simpler.
This also means that the structure for tf
and transform-geom
could be streamlined -- the source/target arity version of create-transform
can be reduced to calling the function with a generated transform, and tf
can always try to figure out the SRID even though it only accepts a CoordinateTransform
.
It also means that jts/set-srid
should be able to accept not just an integer as its coordinate argument, but any kind of Transformable
where crs/get-srid
can be used.
Looks great to me. Nice work with all the protocol stuff. I am not as concerned with the perf aspects as with the structure and avoiding lots of instance?
calls for different use cases. But thanks for looking into the perf stuff anyway.
Current functionality of
transform-geom
either allows for 1) providing a single target CRS or 2) forcing a transformation between two CRSs, but either way this invokes the creation of a CoordinateTransform every time, which is (by far) the most computationally expensive operation. In cases where there's a bunch of geometries to be transformed in the exact same way (all the objects coming out of a shapefile or other geojson, for example), it'd be much more efficient to be able to pass a single transformation and apply that one transform to each. This PR changestransform-geom
's two-argument arity to work with either a target CRS or a CoordinateTransform.Creating a single transform takes multiple milliseconds.
Applying an existing transform to one point takes less than 1 microsecond:
Applying an existing transform to a multipolygon with multiple coordinates, still takes something on the order of 1 microsecond per coordinate: