gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

CoordinatePrecision does not seem to be used during reprojection #517

Open charvolant opened 3 years ago

charvolant commented 3 years ago

The DwC coodinatePrecision term gives the precision of the decimalLatitude/decimalLongitude value. When the lat/long is re-projected, I would expect that the re-projection is then taken to the precision of the original values, assuming it is present.

As an example, if there is a transformation that converts from -37.327653 lat 147.334210 long with a coordinate precision of 0.000001 to -37.32765229999 lat and 147.33421133 then the resulting lat/long should be -37.327652 and 147.334211

Similarly, with -37.33 lat 147.33 and a coordinate precision of 0.01 the transform should be to -37.33 and 147.33

Instead, coordinates seem to be rounded to a uniform 6 decimal places at https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/parsers/location/parser/CoordinateParseUtils.java#L155 and it does not seem to be used after reprojection at https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/parsers/location/parser/LocationParser.java#L142 or https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/parsers/location/parser/CoordinatesParser.java#L91

tucotuco commented 3 years ago

The DwC coodinatePrecision term gives the precision of the decimalLatitude/decimalLongitude value. When the lat/long is re-projected, I would expect that the re-projection is then taken to the precision of the original values, assuming it is present.

As an example, if there is a transformation that converts from -37.327653 lat 147.334210 long with a coordinate precision of 0.000001 to -37.32765229999 lat and 147.33421133 then the resulting lat/long should be -37.327652 and 147.334211

Similarly, with -37.33 lat 147.33 and a coordinate precision of 0.01 the transform should be to -37.33 and 147.33

Instead, coordinates seem to be rounded to a uniform 6 decimal places at https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/parsers/location/parser/CoordinateParseUtils.java#L155 and it does not seem to be used after reprojection at https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/parsers/location/parser/LocationParser.java#L142 or https://github.com/gbif/pipelines/blob/dev/sdks/core/src/main/java/org/gbif/pipelines/core/parsers/location/parser/CoordinatesParser.java#L91

Why would you expect to truncate the re-projected coordinates? That will put the location in a different place. That goes against best practices, which recommend to retain seven digits of precision from any calculation, whether a re-projection or a coordinate format transformation, because if you don't, the reverse transformation will not give you the original location, and that problem can propagate with every transformation. GBIF is correct to keep the new precision, but should keep seven digits instead of six and should set the resulting coordinatePrecision to 0.0000001. The solution to knowing what the effect of the original coordinatePrecision was should be captured in an actual georeference, with either a footprintWKT or a coordinateUncertaintyInMeters.

charvolant commented 3 years ago

The fundamental problem is that, in measurement terms, 147.6 is different to 147.60, even though, as numbers, the two are identical. Depending on convention 147.6 means either "somewhere on the interval 147.600000... and 147.6999999..." or "somewhere between 147.55000000... and 147.64999999..." Standard measurement theory would be the latter, locations derived from grid coordinates would be the former. In the latter example, 147.60 means "somewhere on the interval [147.595, 147.605)"

Therefore any reprojection should honour the precision given in the original coordinates. In this case, the reverse projection will also honour the same precision and should give a correct result. In the -37.327653/147.334210 example, the reverse transformation will go from -37.327652/147.334211 to (eg.) -37.3276529451/147.3342008924 and be rounded back to -37.327653/147.334210. It will not put the coordinates in a different place, since the place is a region, given by the coordinatePrecision, and the rounded projection should be in the same region.

Using 7 figures to calculate the intermediate projection is good practice, although this will happen naturally with doubles. Treating a location measured to 1 decimal place as if it were measured to 6 or 7 decimals is not, since it creates a false precision.

I agree that footprintWKT and coordinateUncertaintyInMeters capture the same sort of information in different ways.

tucotuco commented 3 years ago

Whereas all you say about standard measurement theory is all good and well, it does not apply here. We are talking about a unit conversion, despite the fact that they all seem to be the same (degrees). They aren't.

A simple example to illustrate why this is not apt is, following your example, suppose I tell you I am 6 feet tall with a measurement precision of 1 unit. Being a good world citizen you prefer the measurement in meters, so you calculate that I am 1.8288 meters in height, but to be faithful to the unit precision I am 2 meters tall. My international beach volleyball partner will be ecstatic, except when the reality sinks in that at 1.95m he should probably actually be the designated blocker. Not being what he had hoped for, he dumps me as a partner and a someone in the US looking for a partner sees that, after translating my 2m into feet and rounding, because that is what the partner-finding website does, I am 7 feet tall. The surprises just keep propagating.

That was supposed to be humorous on purpose, because the actual issue is quite serious. The problem at hand has nothing to do with standard measurement theory and everything to do with fidelity of location on the planet. The net effect of your proposed process would be to shift grid systems from their actual locations just so that the transformed coordinates look like the precision in which they were actually recorded. You would have just added a definitive systemic error, shifting a grid for an entire country, potentially, without accommodating the additional uncertainty it just created and without really telling them.

To give an extreme example, what would you do when the original coordinate system is a 1-km UTM grid (measurement precision of 1000)? Every coordinate on the planet would be 0,0 after conserving precision.

We can't represent the size of the place with coordinates alone, we have to do better.