ACCESS-Cloud-Based-InSAR / dem-stitcher

Download and merge DEM tiles
Apache License 2.0
41 stars 15 forks source link

Resampling and Translation Bug #33

Closed cmarshak closed 2 years ago

cmarshak commented 2 years ago

Addresses #32.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

cmarshak commented 2 years ago

I am going to write one more test for the geoid reader.

cmarshak commented 2 years ago

@asjohnston-asf - when it makes sense, can you help me understand why one of my tests is failing here (i.e. tests/test_rio_tools.py? I thought it might be because data was correctly uploaded to the repo, then I successfully tried the tests by cloning it on mac and then on a linux server. Honestly, not totally sure how to debug. It looks like the first array is all 0's but it shouldn't be...

The data is here: https://github.com/ACCESS-Cloud-Based-InSAR/dem-stitcher/tree/reproject-and-test/tests/data/rio_tools/update_resolution

Here is one failed action. https://github.com/ACCESS-Cloud-Based-InSAR/dem-stitcher/runs/6564288126?check_suite_focus=true

I temporarily marked it as an integration test so it's ignored.

Maybe it's an action issue?

cmarshak commented 2 years ago

This is way too large of a pull request for a time-strapped team to review carefully (40 files changed) and so I understand there is going to be some element of trust. I am putting some of the higher level elements of the review.

Integration with DockerizedTopsApp

Also, as apart of this pull request - is the related one here in DockerizedTopsApp. In the pull request, note we pull this branch in the environment file. I ran the code on 4 different areas (their selection was motivated by previous issue tickets) and the resulting GUNWs can be downloaded there. Ultimatley, the 90 meter GUNW doesn't show much of a difference using this new dem.

Most important for @sssangha and @gracebato to take a look at the datasets and just confirm they look reasonable and I am not missing something. The new Dem-stitcher does not extend the origin of DEM tiles if they are not available (this is a feature, not a bug). The resulting GUNW over La Palma is truncated - but its georeferencing is fine as can be inspected.

Documentation about "stitching" and "resampling"

Within this pull request and now the readme, I tried to explain how the tiles are stitched and transformed.

I also included two notebooks to demonstrate that (1) if no geoid is removed the tiles are subset but not resampled and (2) to compare ISCE2 dem.py to our stitcher.

@jhkennedy and @sssangha - I would like you to provide feedback.

API Changes

  1. We add dst_resolution so that we can specify a target square resolution or (x_res, y_res) resolution. We make sure that we do not adjust the origin as is frequently done in rasterio.
  2. We return the API to return the tuple of (X, p). This will greatly support ACCESS's "virtual data processing" (processing in memory) since we can pass the arrays to subsequent workflows directly.
  3. I added a docstring to the main stitch function.

Tests

There are ton of tests that make sure that we are correctly stitching the tiles together. Here are some high level descriptions:

  1. In test_window.py, we want to make sure when we grab an extent within an image we do not do any resampling. There are additional tests associated if the extent is outside the image's extent.
  2. In test_stitcher.py, we are making sure that when we merge there again is no resampling. One of the tests is distilling this notebook from above and making sure when we compare the stitched DEM to the original tiles and do not remove geoid (and do not alter the origin) we are not modifying the original tile data at all.
  3. test_geoid.py does some combination of the first two tests in the context of reading geoids. It also illustrates what a reasonable buffer which depends on the resolution of the geoid relative to the DEM raster that it is being removed from.
  4. There is also test_rio_tools which makes sure that when we upsample the origin does not change.

There is also an integration test (which is ignored by github actions) which does reach onto the servers to make a stitched DEM, but makes sure the main stitch function goes until completion.

Packaging

Unfortunatley, there were packaging issues (see [1] and [2]) that cropped up due to the changes related to gdal and rasterio since I worked with them last that made this pull request harder. It was essential that the DEM stitcher still worked with DockerizedTopsApp which uses conda-forge packages specifically its gdal. Long story short, we found that gdal had to be version <3.4. We also discovered that the PyPI rasterio is not installing correctly for python 3.10+ as in [2] (we have an environment that installs a beta release but it is not tested here).

These issues were first exposed after writing all the tests and passing in an environment with an older python worked fine but they would fail on GitHub actions. After some help from @jhkennedy, it was found that all resampling was defaulting to nearest in certain environments and therefore the above test_rio_tools.py was failing; this was no good to say the least (wanted our geoid to be resampled smoothly to the DEM tiles!). After some investigations, we have settled on the current solution. I hope that @jhkennedy and @asjohnston-asf might help make this cleaner in the future. Currently, I have two environment files (one for PyPi and one for conda) and they are both tested against the new suite - the key is gdal (and maybe proj).

[1] https://github.com/conda-forge/rasterio-feedstock/issues/243 [2] https://github.com/rasterio/rasterio/issues/2473

Code Organization

Please provide feedback if there is time!

cmarshak commented 2 years ago

To illustrate that the outputs preserve the georeferencing of the original DEM tiles, here is an example output over a small .1 degree area overlapping 4 tiles in Central CA.

Consider the following:

from dem_stitcher.stitcher import stitch_dem, get_dem_tiles
import rasterio

dst_area_or_point = 'Point'
dst_ellipsoidal_height = False
dem_name = 'glo_30'

# Central coast of California
# xmin, ymin, xmax, ymax
bounds = [-120.05, 34.95, -119.95, 35.05]

X, p = stitch_dem(bounds,
                  dem_name=dem_name,
                  dst_ellipsoidal_height=dst_ellipsoidal_height,
                  dst_area_or_point=dst_area_or_point)

height_type = 'ellipsoidal' if dst_ellipsoidal_height else 'geoid'

with rasterio.open(f'{dem_name}_{height_type}_{dst_area_or_point}_CA.tif', 'w', **p) as ds:
    ds.write(X, 1)
    ds.update_tags(AREA_OR_POINT=dst_area_or_point) 

The output generates this.

The source tiles are here

https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_N34_00_W121_00_DEM/Copernicus_DSM_COG_10_N34_00_W121_00_DEM.tif
https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_N34_00_W120_00_DEM/Copernicus_DSM_COG_10_N34_00_W120_00_DEM.tif
https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_N35_00_W121_00_DEM/Copernicus_DSM_COG_10_N35_00_W121_00_DEM.tif
https://copernicus-dem-30m.s3.amazonaws.com/Copernicus_DSM_COG_10_N35_00_W120_00_DEM/Copernicus_DSM_COG_10_N35_00_W120_00_DEM.tif

If you look at the tiles with the output in QGIS, you should see no shift with the "stitched-dem" and the source tiles.

cmarshak commented 2 years ago

@jhkennedy

I hope at some point we can add python 3.10 using conda - but won't worry about it now. Also, we can add conda install instructions once we get it onto conda-forge.