arup-group / genet

Manipulate MATSim networks via a Python API.
MIT License
45 stars 9 forks source link

PROJ, proj-data, and accurate reprojections #213

Open brynpickering opened 11 months ago

brynpickering commented 11 months ago

I have been wondering for a while why my local build fails a bunch of tests every time and it turns out that there is a difference in the contents of the PROJ library on conda-forge and on e.g. Homebrew.

The conda-forge version sets an environment variable that leads to transformation gridfiles being pulled from the internet where possible; the homebrew version doesn't set this but downloads the gridfiles and stores them locally on installation (see here). To get these same data files, the conda-forge proj-data library also needs to be installed. It's a big boy: ~700MB.

The conda-forge PROJ has the advantage of a lower footprint on disk, but the attempt to query online gridfiles has an associated bug which I've come across (and which led me down this rabbit hole): https://github.com/pyproj4/pyproj/issues/705

Oddly enough, genet tests only pass with conda packages installed if I do not install proj-data and force PROJ to use its internal (quite simple) transformation grids (pyproj.network.set_network_enabled(False) in genet.__init__.py). This suggests that we are not accessing this large transformation library by default. If our tests are running on the simplified transformations, there are possible accuracy improvements to be had...

brynpickering commented 11 months ago

In addition, I've checked the Docker image builds on AWS (do a search for libproj / proj-data) and the use of apt-get is leading to the old transformations being installed (PROJ v7.x) which we found led to different results than the more current versions of PROJ (>= v8.x). We discussed that issue in #167.

On the GitHub CI it is v8.x being installed.

Setting minimum versions with apt-get seems to be a pain, so this is another reason to move to a fully conda-fied install/build pipeline where we can be more explicit with version setting of PROJ and we decide if we want to include proj-data in an install.