OSGeo / gdal

GDAL is an open source MIT licensed translator library for raster and vector geospatial data formats.
https://gdal.org
Other
4.94k stars 2.57k forks source link

[regression] ogr2ogr should rely on recommended transformation when reprojecting to WGS84 #2219

Open nirvn opened 4 years ago

nirvn commented 4 years ago

Expected behavior and actual behavior.

This is a regression in gdal 3. When converting a non-WGS84 source dataset to KML (which forces a reprojection to WGS84), the source dataset CRS's recommended transformation isn't taken into account, leading to inaccurate results.

Steps to reproduce the problem.

E.g., with the following dataset (in Indian 1960 48N): ELCs2012.zip

Typing ogr2ogr -f KML out.kml ELCs2012.shp will result in an out.kml that is +/- 500 meter shifted northwest. That's a sign the recommended transformation for Indian 1960 -> WGS84 was not picked / taken into account.

Visually, it looks like this: image source dataset in gray, out.kml is red

Operating system, GDAL version and provenance

Ubuntu 20.04

gdal 3.0.4, proj 6.3.0

rouault commented 4 years ago

$ cat ELCs2012.prj PROJCS["Indian_1960_UTM_zone_48N",GEOGCS["GCS_Indian 1960",DATUM["D_Indian_1960",SPHEROID["Everest_Adjustment_1937",6377276.345,300.8017]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1]]

$ cat ELCs2012.qpj PROJCS["UTM Zone 48, Northern Hemisphere",GEOGCS["unnamed ellipse",DATUM["unknown",SPHEROID["unnamed",6377276.345,300.8017000000115],TOWGS84[198,881,317,0,0,0,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1]]

So the TOWGS84[] is only in the QGIS specific .qpj, not in the .prj (which is expected since ESRI WKT doesn't capture TOWGS84)

I'd say that GDAL behaves as expected given that it ignores .qpj

rouault commented 4 years ago

Enabling debug output

$ PROJ_DEBUG=2  ogr2ogr -f KML out2.kml ELCs2012.shp --debug on
Shape: DBF Codepage = UTF-8 for ELCs2012.shp
Shape: Treating as encoding 'UTF-8'.
GDAL: GDALOpen(ELCs2012.shp, this=0x16b0350) succeeds as ESRI Shapefile.
GDAL: QuietDelete(out2.kml) invoking Delete()
GDAL: GDALDriver::Create(KML,out2.kml,0,0,0,Unknown,(nil))
KML: Attempt to create: out2.kml
PROJ: pj_open_lib(proj.db): call fopen(/home/even/proj/install-proj-master/share/proj/proj.db) - succeeded
PROJ: pj_open_lib(proj.ini): call fopen(/home/even/proj/install-proj-master/share/proj/proj.ini) - succeeded
Warning 1: The output driver does not seem to natively support Integer64 type for field ID. Converting it to Real instead. -mapFieldType can be used to control field type conversion.
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
PROJ: Using coordinate operation Inverse of UTM zone 48N + Indian 1960 to WGS 84 (2)
PROJ: Using coordinate operation Inverse of UTM zone 48N + Ballpark geographic offset from Indian 1960 to WGS 84
GDALVectorTranslate: 287 features written in layer 'ELCs2012'
Shape: 287 features read on layer 'ELCs2012'.
GDAL: GDALClose(ELCs2012.shp, this=0x16b0350)
GDAL: GDALClose(out2.kml, this=0x16ad7d0)
GDAL: In GDALDestroy - unloading GDAL shared library.

So depending on points to transform PROJ switches between a ballpart (= null shift) and using Indian 1960 to WGS 84 (2) (+proj=helmert +x=198 +y=881 +z=317) It seems there are points to transform that are outside of the area of use of "Indian 1960 to WGS 84 (2)"

nyalldawson commented 4 years ago

@rouault fyi qgis doesn't use qpj for proj 6 builds anymore. It's a historic relic

jratike80 commented 4 years ago

Where QGIS gets the list of alternative transformations? Projinfo shows only one candidate:

projinfo -s epsg:4326 -t epsg:3148
Candidate operations found: 1
Note: using '--spatial-test intersects' would bring more results (4)
-------------------------------------
Operation n┬░1:

unknown id, Ballpark geographic offset from WGS 84 to Indian 1960 + UTM zone 48N, unknown accuracy, World, has ballpark transformation

PROJ string:
+proj=pipeline +step +proj=axisswap +order=2,1 +step +proj=unitconvert +xy_in=deg +xy_out=rad +step +proj=utm +zone=48 +ellps=evrst30
nyalldawson commented 4 years ago

@jratike80 use "--spatial-test intersects"

nirvn commented 4 years ago

@rouault , we can all ignore the .qpj indeed :)

Alright, as stated in another ticket, at the very least for Indian 1960 48N, the area of use for the helmert transformation is totally wrong. I think we have to look at area of use as suggestive, not absolute. That's what QGIS does, and IMHO it's the right approach here. The helmert transformation is the recommended pick for Indian 1960 48N to WGS84 transformations there.

That helmert transformation has been applied to this Indian 1960 48N CRS across mainland Southeast Asia. The theoretical narrow area of use stated in that (1993?) US military document doesn't reflect the ground reality (heck, it doesn't even reflect US military usage of that CRS back in the 60s!).

While I can't judge of how many area of use statements don't match reality, I'm sure this is not the only case.

On top of which, it's a regression when compared to gdal pre 3.0. In gdal 2 (as well as gdal 1), doing that ogr2ogr command while result in a properly transformed KML. That's because gdal2 was unconditionally (and in fact, rightly :) ) applying the +towgs84 parameter back then.

Which means we've had well over a decade of open source dealing with Indian 1960 48N <-> WGS84 conversion applying the helmert transformation across the whole Indian 1960 48N covered area (Cambodia, part of Vietnam, southern Laos, etc.).

jratike80 commented 4 years ago

The title is using term "recommended transformation". Recommended by whom?

This ticket seems to be a reversed version of this 9 years old one https://trac.osgeo.org/qgis/ticket/3645 There I can read " in GDAL 1.8.0, a lot of TOWGS84 have been added by taking into account the preferred datum shift from the EPSG database" and by following the tracks to the corresponding changeset https://trac.osgeo.org/gdal/changeset/18978 it seems that the EPSG database version was 7.4.1. Then I downloaded newest EPSG db version 9.8.6 and tried to find the preferred datum shifts with no success. Certainly I do not know where to search, and perhaps the database is changed meanwhile.

I am looking the situation as an unaware GDAL user. If I happen to receive one day some data in EPSG:3148, should I know from some authorative source that I must apply TOWGS84[198,881,317,0,0,0,0] ? Or is is so that people have just started to apply those TOWGS84 parameters for Indian 1960 48N CRS across mainland Southeast Asia and it is best to compare each dataset with Google maps first?

nirvn commented 4 years ago

If a user as been using gdal since 1.8 and has gotten one result when doing Indian 1960 48N <-> WGS84 transformations, I think there's something to argue about that decade old behavior having established a standard of its own, which gdal 3.0 should respect.

I've looked into this a bit more, and found 370 (!) CRSes in PROJ6, when compared against GDAL2/PROJ5, that output a proj string missing a default +towgs84 helmert transformation. See the attached CSV file for the list of those CRSes: result.zip

I think PROJ >= 6 adds a +towgs84 parameter into the proj string if it detects an helmert transformation that matches the CRS bounding box. Is that correct @rouault ?

If so, I would propose to remedy to this user-end regression by adding into PROJ's transformation database the default helmert transformation missing for these 370 CRSes with an area of use that matches the associated CRS bounding box. We can add a remark on those transformation along the lines of "derived from gdal 1/2".

That way, gdal would maintain its decade-long reprojection results.

rouault commented 4 years ago

I think PROJ >= 6 adds a +towgs84 parameter into the proj string if it detects an helmert transformation that matches the CRS bounding box. Is that correct @rouault ?

yes

If so, I would propose to remedy to this user-end regression by adding into PROJ's transformation database the default helmert transformation missing for these 370 CRSes with an area of use that matches the associated CRS bounding box.

GDAL 2 choice was more or less arbitrary, sometimes "right", sometimes "wrong". The transformation parameters it used should most of the time still be in PROJ 6 database (except if they have been deprecated in the mean time by EPSG), so PROJ might use them.

Honestly I'm not sure about the best way forward. Perhaps you should issue a change request to EPSG so that they extend the area of use of "Indian 1960 to WGS 84 (2)" ?. I'd really want to avoid PROJ database to have custom overrides as much as possible. This will become a maintenance nightmare.

nirvn commented 4 years ago

@rouault , as your quotation marks imply, "right" and "wrong" is somewhat arbitrary here. In that ambiguity, I feel if gdal has had a consistent transformation rule for a decade+, there's a burden of consistency that needs acknowledging here :)

To stick with the Indian 1960 48N, gdal (and arc, for what it's worth) has reprojected coordinates using the helmert transformation. This represent a decade of datasets that are only "right" using that transformation.

While I'm there with you in terms of avoiding corner case tweaks for corner scenarios, one decade makes a behavior a rule, not an exception. :)

In terms of filing a request for change of the registry, while I'm probably in a relatively good position to go through that motion, I feel it's not the right approach to take care of the bigger picture. We have at a demonstrable 370 CRSes that wont reproject under gdal3 the way they did for a decade, across all continents.

nirvn commented 4 years ago

Oh, also, adding additional "gdal 2 transformations" remarked transformations to the registry IMHO isn't really a hack, it fits in quite nicely with the new ecosystem. People can pick alternative transformation if needed.

Cheers.

rouault commented 4 years ago

@nirvn I'm not against your proposal, but still a bit uneasy about it. I'm not sure how much of the 370 CRS are really problematic situations, that is where users did rely on a Helmert transformation outside of the area of use advertized by EPSG.

If we were to do what you suggest, we'd have to consider to which authority (PROJ ? GDAL2 ?) we'd attribute the new transformation codes (there might be algorithmic changes to do in PROJ code and/or entries of the authority_to_authority_preference database table, so that it behaves appropriately. appropriately to be defined...), and which accuracy we'd advertize (probably "unknown"). Anyway the solution if any is in PROJ, not GDAL, so this should be a PROJ ticket. Experimenting with a couple cases before potentially generalizing to the 370 CRS would probably be a good idea. I'm happy if someone takes the lead on this

nirvn commented 4 years ago

I like the idea of a GDAL2 authority. And yeah, unknown accuracy.

I could try and prototype something around that specific CRS I'm intimate with, then expand to a couple of additional CRSes that can be proofed.

nirvn commented 4 years ago

Though for the record: I don't have a working environment to build proj and gdal so my good will might not let me too far ;)

rouault commented 4 years ago

@nirvn If you do have one for QGIS, then that shouldn't be too hard. I'm happy to give guidance

jratike80 commented 4 years ago

Adding more historical notes: it was about here when the preferred datum shift appeared https://trac.osgeo.org/gdal/ticket/3357. And "PREFERRED" is the last column in file "datum_shift.csv" in tthe csv file that we used to have in those Proj4 days. And here Frank W. describes how the column was filled https://fwarmerdam.blogspot.com/2010/03/in-last-few-weeks-i-believe-i-have-made.html

The larger datum shift area can be seen here together with the area of use for EPSG:3148 towgs84

For me this Indian 1960 case feels exceptional. At least I hope that other towgs84 values have not been applied to so much larger area than the documented area of use. The method used by Frank was based on rather soft soil (even it did make also me happy with the Finnish KKJ system that used to lack towgs84 parametes and was therefore 120 meters off) and I do not believe it is good to put all those PREFERRED tags automatically.

ESRI and QGIS way is to let user to select from a list and that feels right to me. However, they have GUI and warnings and selections do no suit so well for command line usage.

rouault commented 4 years ago

Thanks Jukka for digging back into history !

However, they have GUI and warnings and selections do no suit so well for command line usage.

Indeed. @nirvn GDAL >= 3.0.3 associated to PROJ 6.3.0 just relies on proj_create_crs_to_crs() to select the transformation, which does this decision independently on each point (as you can see in the log I put in https://github.com/OSGeo/gdal/issues/2219#issuecomment-582572386). If we were to add a GDAL2 authority in PROJ with transformations with unknown accuracy, I'm not completely sure PROJ would use it automatically rather than the ballpark/no-op one (maybe... Needs testing. I just don't master the monster I've given life to :-)). But what is sure is that if you've a point that would fall in the area of use of another EPSG transformation, that one would be used.

nirvn commented 4 years ago

@jratike80 , in the KKJ case, the towgs84 parameter is showing up under proj6, lucky you ;)

I don't see much of a negative in adding a GDAL2 authority with transformation it carried along for a decade (of unknown accuracy); since the proj_create_crs_to_crs will prefer non-unknown accuracy, it'll pick more relevant transformation when falling within areas of use of knonw-accuracy transformation when relevant.

@rouault , you raise a good point, code might have to be slightly modified to add extra weight to herlmert when compared against a ballpark/no-op one. I.e., herlmert of unknown accuracy and area of use X transformation would weight more than a ballpark/no-op transformation with matching unknown accuracy and area of use.

jratike80 commented 4 years ago

For helping the next generation of archaeologists, is there anything else published about the ground reality that what you wrote above?

That helmert transformation has been applied to this Indian 1960 48N CRS across mainland Southeast Asia. The theoretical narrow area of use stated in that (1993?) US military document doesn't reflect the ground reality

Is the ground reality exclusive, do you believe that there are no much data using Indian 1960 in some areas without datum shifts? The area of use of Indian 1960 covers according to EPSG registry area of "Cambodia and Vietnam - onshore & Cuu Long basin" (area code 4007 in the EPSG database). By your knowledge, is this correct or is Indian 1960 used also elsewhere? Do you know if for example ESRI has been using the same towgs84 parameters during the last decade?

For preventing that GDAL does this with GDAL2 authority datum shifts while other software do something else based on EPSG only, perhaps there should be at least a switch for turning the GDAL2 authority off.

nirvn commented 4 years ago

@jratike80 , there was a great paper written by Chistophe Feldkotter, a consultant employed by the Mekong River Commission (a regional body in mainland Southeast Asia), covering this. I can't find it out.

That said, there's plenty of example throughout the last decade in Cambodia of government sub decrees etc using Indian 1960 all over its territory (with the towgs84 parameter used for transformation). For e.g, here's a land reclassification sub decree with an annexed map: https://data.opendevelopmentcambodia.net/en/dataset/sub-decree-on-reclassification-of-4385-hectares-of-land-located-in-beoung-per-wildlife-sanctuary-in -- the UTMs are in Indian 1960 48N. If you project those UTMs onto WGS84 without the helmert transformation, the boundaries won't align.

You can also download the [Japanese gov.] JICA-produced topo map for Cambodia here: http://arunatechnology.com/resources/downloads/ -- you'll see the grid matches Indian 1960 with the towgs84 parameter. I'm pretty sure JICA also used that CRS to produce topo sheets for Laos too.

Similarly, ESRI has been using the same towgs84 since early 200s; I'm not an ESRI user, I'm not intimately familiar with it.