OSGeo / proj-datumgrid

Historic repository for proj datum grids. New developments are at https://github.com/OSGeo/PROJ-data
42 stars 33 forks source link

Proposal for future organization of proj-datumgrid #74

Closed rouault closed 4 years ago

rouault commented 4 years ago

I just had a IRC chat with @hobu and we discussed about future plans for the organization of proj-datumgrid. The 'regional' model is mostly a workaround to have packages of reasonable size and a structure of the repository, but it isn't perfect. For example, data for France is in europe/ but a number of grids in those grids are for overseas directories in North America, Carribeans and Indian ocean. Also the regional approach doesn't necessarily match the user needs: for Oceania, it is unlikely that a New Zealander will need Australian data given they have no common boundary.

I've done a quick experiment at converting all our NTv2 and GTX files into DEFLATE compressed TIFF: the result is 470 MB of compressed TIFF for the whole repository, vs 1.4 GB of uncompressed NTv2 and GTX files. So if RFC4 is implemented and libtiff a required dependency, we could just release one single proj-datumgrid-all.zip file. That would make life easier for us, and users (confused by which packages to install). People that would want more fine grained distribution could potentially use a future proj-datumgrid-install tool ( https://github.com/OSGeo/PROJ/issues/1750 ) that would download (from the CDN?) the grids correponding to a producing agency. To have still some structure in the proj-datumgrid repository, we could have subdirectory by producers: ignfrance, linz, noaa, etc... We would just fold everything into a single package for release.

rouault commented 4 years ago

I've created a CSV database of our grids in https://github.com/OSGeo/proj-datumgrid/blob/master/filelist.csv It serves several purposes:

ccrook commented 4 years ago

I think this is a great start. I also think it may be appropriate to have a process of connecting to source repositories (eg for New Zealand either github https://github.com/linz/proj-datumgrid-nz or zip file of grids at https://www.geodesy.linz.govt.nz/download/proj-datumgrid-nz. This could be of the nature directly pulling for maintaining the CDN, or by users with an installer script as inhttps://github.com/OSGeo/PROJ/issues/1750.

My rationale is borrowed from discussion on https://github.com/OSGeo/proj-datumgrid/pull/69 and also list conversation https://lists.osgeo.org/pipermail/proj/2019-November/009047.html.

Essentially the concern is that the use of grids and number of grids is likely to grow, and it feels like that could create a maintenance issue for PROJ.

The reasons why I think managing grids could grow are:

hobu commented 4 years ago

My vision for the CDN project in regards to responsible agencies is to give them a worldwide distribution channel for their grids that mainlines them into a software toolchain that touches a large portion of the geospatial ecosystem in some form or another. I hope that other software packages and frameworks pick up usage of the CDN as well, but I think PROJ is large enough on its own to drive adoption.

The current situation is responsible agencies place their grids on their local download pages and hope people are able to use them. Software vendors pick these up and sometimes incorporate them in software, but it is haphazard. The distribution channel is what hasbeen missing.

The process of getting a grid into the CDN is going to have a somewhat long time window due to the lag in the steps.

1) Responsible agency creates and refines the grid and declares it ready for release. 2) Responsible agency notifies EPSG of the grid and works the process of getting it into the EPSG database. 3) Wait for an EPSG release 4) Responsible agency submits the grid to proj-datumgrid project. 5) proj-datumgrid project applies validation process (metadata, format, etc) to grid and adds it to the CDN 6) New grid is available in PROJ when new PROJ is released with EPSG update.

To get their data into the CDN, agencies will have the extra burden of interacting with the proj-datumgrid project. We need to make this as seamless as possible for them and think about procedures and workflows that do that.

I expect that other coordinate transformation libraries beyond PROJ would take also advantage of the grid CDN if it were sufficiently designed, organized, and operated. I think that the open source approach of leading with a working example is the place we should start with the effort, however. It is folly to try to build something from the outset that everyone might possibly use.

rcoup commented 4 years ago

@hobu couldn't steps 4/5 can happen in parallel to 2/3?

I can't see making it available via the CDN before it's available in EPSG/PROJ releases would break anything, and means new grids start being used immediately with the PROJ release being published.

hobu commented 4 years ago

@hobu couldn't steps 4/5 can happen in parallel to 2/3?

For sure it can, but I think the grid needs to be buttoned up and released and in the EPSG db before the CDN starts advertising it.

rouault commented 4 years ago

I mostly agree with @hobu. As much as possible we want things to be validated by EPSG. It minimizes the risks and actions on our side. As there are sometimes/often adaptations on filenames between what EPSG registers and what we use, we have a grid_alternatives table of proj.db to map EPSG filenames to PROJ filenames (currently it is also used to specify in which package - europe, oceania, etc... - the grid is found). The above holds for cases where we want projinfo -s EPSG:FOO -t EPSG:BAR to work automatically.

Once the CDN mechanism is in place, I'm wondering if we would want a +grids=foo.tif to cause a CDN access even if the grid is not known in the grid_alternatives table of proj.db. That's an open question. There are probably future 'advanced' datasets that perhaps don't fit in what EPSG wants to register, but that could be used manually with "+proj=superweirdtransformation +grid_1=foo.tif +grid_2=bar.tif". One example of such superweirdtransformation is the existing deformation method : https://proj.org/operations/transformations/deformation.html

kbevers commented 4 years ago

@rouault When would you introduce this change? At 7.0.0?

As much as possible we want things to be validated by EPSG.

Agreed. I also think that it might be time to formalize that in an RFC. Some form of governance model for which transformations we allow in the PROJ packages.

One example of such superweirdtransformation is the existing deformation method

It it only weird because we lack the ability to use multi-layered grids (n>2) at the moment. When we eventually support nicer grid formats the operation will be changed to support that.

rouault commented 4 years ago

When would you introduce this change? At 7.0.0?

A single package organization you mean ? Yes, probably for 7.0. We should likely stick with the current organization for 6.3.0. I think I wrote somewhere that before 6.3.0 we should make sure that grid_alternatives entries point to precise versions of the proj-datumgrid-XXXX packages and not the -latest (which was a convenience to avoid forgetting updating it), in case we would still keep the regional organization but put .tif grids in them instead, which PROJ 6.3 wouldn't like

rouault commented 4 years ago

Implemented by https://github.com/OSGeo/PROJ-data