RDP filter - Githubissues

Would you accept a pull request that adds an implementation of the RDP algorithm?

An implementation that uses the shortest distance between a point and a geodesic line segment (or alternatively a Rhumb line segment?) is relatively straightforward, but there are some small caveats; hence, having a robust implementation here can save others some time scratching their head in a debugger session. However, I think this feature could come in handy in libraries such as pyproj that wrap PROJ but somehow it feels more natural to implement this feature once here instead of multiple times in different wrappers. What do you think?

GEOS already has a DouglasPeuckerSimplifier algorithm. Is the RDP algorithm you propose different?

As for your second consideration for shortest distances using geodescis and Rhumb lines, these are geographic operations. As far as I know, the common Douglas–Peucker algorithms assume projected 2D Cartesian space, not geographic space. GEOS mostly operates with 2D projected coordinates (occasionally 3D orthogonal coords). ~~It might be difficult to convince others to consider geographic coordinates.~~ (Update: too many browser tabs were open, some of them were GEOS-related and I was mixed up. No need to convince PROJ devs about geographic measures)

There are a few geographic-oriented libraries available. The main one is GeographicLib's C++ implementation. There are some (e.g.) Python bindings with geofun to work with geodesic and Rhumb lines.

An implementation that uses the shortest distance between a point and a geodesic line segment (or alternatively a Rhumb line segment?) is relatively straightforward, but there are some small caveats

@avitase: I could find this quite useful, e.g. for analysis and design of national border delineations, which are typically given as series of geodesic segments.

Architecture

To get this into the PROJ code base, I believe that the path of least resistance would be to first and foremost provide the essential "perpendicular distance to geodesic" building block, which fits smoothly into the Coordinate Operation concept, without having to extend the API surface. Perhaps using a syntax like

proj=rdp  lat_0=55 lon_0=12  lat_1=59 lon_1=18  ellps=GRS80

i.e. defining the geodesic segment using lat_*/lon_* pairs, then feeding the coordinates through like in any other coordinate operation case.

Output

The next question would be what the output of the rdp-operation should be. Obviously one of the output components would be the distance from the point to the geodesic. But I assume that (depending on your implementation strategy), additional user-useful information could be conveyed.

Personally, I could find use for the coordinates of the intersection between the geodesic, and its perpendicular-to-the-input-point, i.e. a 3D output in the format (longitude, latitude, distance). Unit-wise, this would fit with the structure of an ordinary (latitude, longitude, height) tuple, i.e. (angle, angle, length).

It would, however, fit poorly with 2D filters like the proj command line tool. And since I suspect that what actually comes naturally out from your algorithm may very likely be the distance along the geodesic segment, it would probably be more sane to provide output as a (distance-to-geodesic, distance-along-geodesic) tuple, i.e. 2D with units (length, length), which is perfectly in line with the expected output from the proj tool.

Inversion

Can this be trivially inverted, i.e. computing the geographical coordinates of the "point a km from the geodesic segment, which perpendicularly intersects it b km from the segment origin"?

Rationale

As already mentioned, implementing the distance-to-geodesic-segment rather than the entire RDP algorithm, has the advantage of not having to extend the PROJ API and architecture: It fits neatly with PROJ's traditional "streaming mode" processing model, where one point is considered at a time, i.e. the coordinate operation building blocks are sequentially served their input points from an external data flow architecture.

They do not have any memory of past points passed, and hence cannot carry out the reduction part of the RDP algorithm: This would require a buffering API, taking as input a set of N coordinate tuples, and returning another set of M coordinate tuples. This functionality does not currently exist in PROJ, and it probably would be a poor fit, architecturally and API-wise.

So as you already suggest, the "full algorithm" implementation (which is trivial once having your distance-to-geodesic available) fits better into PROJ-wrapping libraries such as pyproj.

I agree fully with your statement that it feels more natural to implement this feature once here instead of multiple times in different wrappers

General architectural remark

In general, I believe that a large number of geodetical operations can meaningfully fit into PROJ's point-sequential coordinate operation architecture. In Rust Geodesy, I have e.g. implemented geodesics, radii-of-curvature, and auxiliary latitudes as plain operators.

As one of the aims of Rust Geodesy is to be a playground for potential PROJ improvements, these are obviously something I think should be considered for PROJ, and your RDP suggestion is very much in line with this.

GEOS already has a DouglasPeuckerSimplifier algorithm. Is the RDP algorithm you propose different?

@mwtoews, thank you for your comment! You are right, there are implementations of RDP out there in the wild. However, what I propose here is an implementation that uses a different metric to decide whether a point is (in)significant. For short distances on Earth, I don't expect any difference between an RDP operating on a Mercator projection using the canonical cartesian distance or the accurate geodesic distance as the distance is only used as a threshold value. Still, if the distance between points becomes large, there will be a difference and I don't see any good reason to not use the right metric here. (Whatever right means depends on the context, for example, the genuine movement of a vessel between points could be a Rhumb line, and for a plane, it could be a geodesic.)

There are a few geographic-oriented libraries available. The main one is GeographicLib's C++ implementation. There are some (e.g.) Python bindings with geofun to work with geodesic and Rhumb lines.

I know, The question is whether this is something that should be part of PROJ.

[...] provide the essential "perpendicular distance to geodesic" building block, which fits smoothly into the Coordinate Operation concept, without having to extend the API surface. [...] I suspect that what actually comes naturally out from your algorithm may very likely be the distance along the geodesic segment, it would probably be more sane to provide output as a (distance-to-geodesic, distance-along-geodesic) tuple, i.e. 2D with units (length, length), which is perfectly in line with the expected output from the proj tool.

@busstoptaktik, I really like this idea, thanks for the suggestion! In fact, I already have written a library (in Python) that does exactly this: github.com/avitase/geordpy/. There, my naive implementation only needs the (absolute) distance but estimating the distance on the segment actually comes for free and could thus be yielded as well. Once such a distance function is available, implementing the RDP algorithm can indeed be done with a few LoCs.

Inversion

The inversion is the canonical direct geodesic problem if one can disambiguate the point at +90° and -90°, right? I conjecture that this could be done by assigning a sign to the distance.

Open question I have the hunch that approximating geodesic distances with great circles is sufficient for the RDP algorithm. The benefit here is that AFAIK, it is easier to estimate (@cffk, please prove me wrong). For an accurate estimation of geodesic distances, one could use a Gnomonic projection but this comes with the downside that this projection is limited to distances (approximately) smaller than the quarter of Earth's radius. I don't know if this is an issue in real-world applications and maybe there are other solutions to the problem.

I'm on vacation right now. I'll respond once I've returned (Sept 8).

I found some answer(s) to my open question(s):

arXiv:2308.00495: In Appendix B @cffk shows how to improve the results by Baselga and Martínez-Llario on how to find the minimum distance from a point to a geodesic (segment). Unfortunately, I don't speak Russian and don't know if the issues reported by Botnev and Ustinov are relevant here as well.
For finding the shortest distance to a Rhumb line segment the Mercator projection comes to mind. Cut-offs near the poles shouldn't be an issue in practice as I don't expect any plane or vessel to follow Rhumb lines near the poles. However, this isn't a clean solution and I am not 100% satisfied with it; in particular, it is not possible to find the distance to a point near the poles even if the Rhumb line itself is sufficiently far away! An alternative could be to invert Eq. (47) and use it to iteratively find the spot at which the geodesic (to the point) intersects the Rhumb line perpendicularly using Newton's method. For this to work one has to restrict the Rhumb line segments to encircle the ellipsoid at most half. If longer segments are needed one has to chop them into shorter pieces and apply the algorithm to each of them. When using Eq. (47) for the derivative, it should be fairly easy to detect if the shortest point sits on the boundary of the segment, as there is either exactly one minimum or one maximum on restricted segments. (Maybe there is even a convex formulation of the problem?)

OSGeo / PROJ

RDP filter #3865

Architecture

Output

Inversion

Rationale

General architectural remark