georust / proj

Rust bindings for the latest stable release of PROJ
https://docs.rs/proj
Apache License 2.0
137 stars 45 forks source link

Initialize `zt` before calling `proj_trans` #104

Closed lnicola closed 2 years ago

lnicola commented 2 years ago

CC #99

r? @michaelkirk

michaelkirk commented 2 years ago

bors try

bors[bot] commented 2 years ago

try

Build succeeded:

lnicola commented 2 years ago

To expand on the "not strictly correct" remark: in C++ (unlike C) you're not allowed to read a different union member than the one you've written to last.

But proj_trans calls pj_get_suggested_operation which reads xyzt, then pj_fwd4d -> fwd_prepare, which reads v. Finally, it calls through a function pointer which reads either xyzt (I suppose), xyz or xy, depending on what's available in PJ.

So we can only hope for the best here.

lnicola commented 2 years ago

bors try

bors[bot] commented 2 years ago

try

Build succeeded:

michaelkirk commented 2 years ago

I'm not sure if you saw, but I had a PR that similarly juggled with the union type and produced passing tests. But as I didn't understand how it fixed things, I figured I just fixed the tests, not the code... that the bug was just as likely to emerge for other peoples use cases.

Your c++ union observation is interesting, but it's still not clear to me how it fixes anything.

lnicola commented 2 years ago

I think that PR is orthogonal to this one. This one is not fixing UB caused by reading the wrong union field.

This PR initializes z and t before passing them to proj. I don't know if that's happening, but I can imagine how passing NaN as the time might result in NaN as the output. See https://github.com/OSGeo/PROJ/blob/8.1/src/fwd.cpp#L195-L200.

michaelkirk commented 2 years ago

Sorry to be slow to respond to this...

This PR initializes z and t before passing them to proj. I don't know if that's happening, but I can imagine how passing NaN as the time might result in NaN as the output. See https://github.com/OSGeo/PROJ/blob/8.1/src/fwd.cpp#L195-L200.

I'm not sure what's significant about the code you linked to.

if (P->fwd)
        coo.xy = P->fwd(coo.lp, P);
    else if (P->fwd3d)
        coo.xyz = P->fwd3d (coo.lpz, P);
    else if (P->fwd4d)
        coo = P->fwd4d (coo, P);

Presumaby we're hitting this branch:

 if (P->fwd)
        coo.xy = P->fwd(coo.lp, P);

You think that somewhere in that branch proj is reading something other than the first two floats in coo?

lnicola commented 2 years ago

I don't know how to check, but the first result from projinfo has a Helmert step with a non-zero +z parameter, so it might be at least 3D.

lnicola commented 2 years ago

So:

$ cs2cs EPSG:4326 EPSG:3309
34.095620 -118.283555 0
158458.67   -434296.88 0.00
34.095620 -118.283555 1000
158458.66   -434296.88 1000.00
34.095620 -118.283555 2000 
158458.65   -434296.88 2000.00

It's small, but notice how the output x depends on z. If z was NaN, it might propagate over to it.

busstoptaktik commented 2 years ago

In general, you should always expect geodetic operations to be four dimensional - the exception being when going from geographical (or perhaps projected) coordinates on the ellipsoid, to (other) projected coordinates on the same ellipsoid and in the same datum: In these cases, most 2D systems are more or less equivalent: From a geodetic point of view, it doesn't matter whether you have geographical or UTM coordinates: As long as they are referred to the same datum/reference frame, you can always convert one to the other at arbitrary precision, since you are doing entirely mathematical operations in an ideal mathematical world.

Once you need to convert between different reference frames, geophysics and the irregularities of the real world enters and you need to acknowledge that the world is fundamentally 4 dimensional. When you work in two dimensions, you are limited to either stay in one reference frame, or accept a probable overall accuracy at the metre level (which of course is quite satisfactory for many - or most- real world applications)

lnicola commented 2 years ago

Thanks for chiming in, @busstoptaktik.

In this case (our failing test converts EPSG:4326 to EPSG:3309), the two seem to use different ellipsoids, and even with my black-box knowledge of libproj, it seems bad practice to pass uninitialized variables without a guarantee that they are not going to be used.

lnicola commented 2 years ago

@michaelkirk how do you feel about merging this? We could try to rebase #99 on top of it to see if it helped.

michaelkirk commented 2 years ago

Thanks for your investigation @lnicola and thanks to @busstoptaktik also for adding some context!

One of the things I was caught up on was why our code would be broken, when the equivalent code from c presumably isn't — why don't people writing C programs have to jump through the hoops in this PR?

And I think the answer is that I don't know very much about C.

From (from https://en.cppreference.com/w/c/language/struct_initialization)

All members that are not initialized explicitly are zero-initialized

I don't think rust makes any such guarantees - the z/t dimensions could be whatever garbage was left in them.

Based on this... I wonder if we should be initializing t to zero:

- t: f64::INFINITY,
+ t: 0.0

...to be in line with what proj expects to gets from a c user of the "two dimensional" api. WDYT?

We could try to rebase https://github.com/georust/proj/pull/99 on top of it to see if it helped.

Good idea - I've just merged your branch into #99 here: d1eaaee

Let's make sure that extended CI passes, then I'm fully on board with merging this!

lnicola commented 2 years ago

Based on this... I wonder if we should be initializing t to zero:

I think it's safe to do what libproj does, i.e. use INFINITY: https://github.com/OSGeo/PROJ/blob/8.1/src/4D_api.cpp#L492-L510.

michaelkirk commented 2 years ago

Ok, let's do it then!

bors r+

Thanks again!

bors[bot] commented 2 years ago

Build succeeded:

busstoptaktik commented 2 years ago

@michaelkirk said:

I wonder if we should be initializing t to zero

zero would indicate that the valid time of the coordinate is at the current epoch, i.e. approx 2000 years ago. Since that is probably not the case for any majority of digital geographic resources, it may be better to select a more current ballpark estimate - say, 2000.0, if you do not have any better estimate?

At an average geological plate velocity of 3 cm/y, the 2000 year evolution would translate into a 60 m dscrepancy, whereas the 20-something years of selecting 2000 as ballpark estimate would only translate into 60 cm.

lnicola commented 2 years ago

Oh, is time really used like that?

I assumed it was used only to select an appropriate transform when the definitions changed in the past. And I thought using infinity made sense, in order to pick the most recent definitions available.

michaelkirk commented 2 years ago

I appreciate your bringing up how important the time dimension is @busstoptaktik. We should add support for that to georust/proj. But I'd like to make sure we're on the same page here with this most recent change...

The problem that this PR aimed to address was wildly inconsistent results from georust/proj. e.g. see here https://github.com/georust/proj/runs/5013324366?check_suite_focus=true where tests sometimes fail (and sometimes don't).

This repository only supports 2-D points. It's possible to use libproj with 2-D points and get consistent results (though it clearly can't account for plate shifting with only 2-D points). Right?

But the errors we were seeing aren't even at the level of geological plate shifting — it's about us making the wrong assumptions about how rust interacts with C.

Thanks to @lnicola, we now have a theory and a merged (presumed) fix for the gap in behavior between what a user of georust/proj would see vs a direct consumer of libproj.

I'd like to ship out this fix as soon as possible, so existing users can get the fix and so that we can move on to other improvements (like who knows, maybe 4D points!)

But first I want to hammer down the lingering doubts raised in this thread:

zero would indicate that the valid time of the coordinate is at the current epoch, i.e. approx 2000 years ago.

Is this really the case for 2-d users of libproj (e.g. someone using PJ_COORD { xy: }. libproj assumes they want 2000 year old data? That would be such a sad choice for the very large group of your users that only use 2-d points, who almost surely want something closer to today than to something 2000 years ago.

In any case, my feeling is that as "a wrapper for libproj", georust/proj is obliged to offer the same assumptions that libproj offers. It'd be a surprising bug if people are getting different results from libproj than they are from georust/proj.

(I don't think that a new library like https://github.com/busstoptaktik/geodesy should be doomed to carry forward any historical baggage like that though)

Based on this... I wonder if we should be initializing t to zero:

I think it's safe to do what libproj does, i.e. use INFINITY:

If I'm reading this correctly, when encountering z: INFINITY or t: INFINITY within proj_trans, it get's normalized to 0 anyway. 🙃

https://github.com/OSGeo/PROJ/blob/master/src/fwd.cpp#L49

    if (HUGE_VAL==coo.v[2] && P->helmert) coo.v[2] = 0.0;
    if (HUGE_VAL==coo.v[3] && P->helmert) coo.v[3] = 0.0;

Sooooo, anyone opposed to releasing to crates.io?

busstoptaktik commented 2 years ago

Is this really the case for 2-d users of libproj (e.g. someone using PJ_COORD { xy: }. libproj assumes they want 2000 year old data?

No, PROJ does not assume anything, but if you call a dynamic transformation with garbage data, you will get garbage answers. The main problem is the widespread use of EPSG:4326 as "don't care" value: Since WGS84 is a global datum, any transformation into plate-fixed coordinate systems (like ETRS89 in Europe, NAD83 in North America), will be through a dynamic transformation.

As long as you stay inside the same datum, and only change from (say) geographical coordinates to (say) UTM, there is no problem. The same is the case for transforming between different generations of plate fixed reference frames (e.g. ED50 to ETRS89): No problem at all, since the transformation is not time dependent.

So in conclusion: This PR solves a real problem, and should obviously be pushed, but it does not solve all potential problems: You can only do that by having a strict 4D processing pipeline, and I do not think this will be useful at all for most of the use cases of GeoRust.