GeoscienceAustralia / dea-notebooks

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and Xarray
https://docs.dea.ga.gov.au/notebooks/
Apache License 2.0
441 stars 128 forks source link

Harmonize CRS handling across notebooks/utilities #600

Closed Kirill888 closed 1 year ago

Kirill888 commented 4 years ago

History

The way CRS information is stored and accessed in datacube has changed over the years. Those changes were driven primarily by two factors: better interoperability with xarray/rasterio and other geospatial libraries and better "handling of various xarray operations".

Initially CRS information was stored as a datacube CRS class in an attribute named crs on each datavariable and on dataset object returned from dc.load(..). This approach was problematic as it prevented methods like xarray.to_netcdf(..) from working (didn't know what to do with crs attribute that is not a string). It also suffered from "missing attribute" problem, some common xarray operations like type casting could drop crs attribute.

Datacube code was then changed for a brief period to store crs attribute on the x,y coordinates, and to be a string and not a CRS class, this allowed for better "survivability" of CRS data across per-pixel operations, as coordinates are essentially copied from source to destination including all the attributes attached to them. Turning crs into a string allowed for ability to save data with xarray.to_netcdf method.

Then we had a proposal to change representation yet again to be inline with what NetCDF CF convention is using. This had a lot of merit and provided with cleaner solution than what we had at that point in time, so we went along and implemented that as part of 1.8.0 release.

Current State

Changes Required to Notebooks

Those changes are difficult to make in a way that will work with older and newer versions of datacube, so I propose we just change to support newest version only 1.8.1 and forward. Probably 1.8.2 that is yet to be released as there has been some more fixes to .geobox handling for external sources since 1.8.1.

I have been able to run following with current datacube code after some changes to notebooks and to xr_rasterize function

Applying_WOfS_bitmasking.ipynb .................                                     [  6%]
Calculating_band_indices.ipynb .........                                             [ 10%]
Contour_extraction.ipynb ................                                            [ 16%]
Exporting_GeoTIFFs.ipynb ..........                                                  [ 20%]
Exporting_NetCDFs.ipynb .........                                                    [ 24%]
Image_segmentation.ipynb ...............                                             [ 30%]
Imagery_on_web_map.ipynb ............                                                [ 35%]
Integrating_external_data.ipynb ............                                         [ 40%]
Machine_learning_with_ODC.ipynb ....................                                 [ 48%]
Masking_data.ipynb ..............                                                    [ 53%]
Opening_GeoTIFFs_NetCDFs.ipynb ...................                                   [ 61%]
Pan_sharpening_Brovey.ipynb .............                                            [ 66%]
Rasterize_vectorize.ipynb .............                                              [ 71%]
Using_load_ard.ipynb ................                                                [ 78%]
Virtual_products.ipynb ...........................................                   [ 95%]
Working_with_time.ipynb ............                                                 [100%]

but there are way more notebooks than these...

The tricky part is synchronising updates to sandbox/dea on NCI/notebooks repos.

Kirill888 commented 4 years ago

Needed changes look something like this https://github.com/GeoscienceAustralia/dea-notebooks/commit/39c0532eebf1928a4e4cebe2c34210522ffc2783, except that there are more places

Scripts/notebookapp_miningrehab.py:85:    ds_resampled.attrs["crs"] = dataset_fc.crs
Scripts/notebookapp_changefilmstrips.py:214:        ds_geomedian.attrs['crs'] = ds.crs
Frequently_used_code/Contour_extraction.ipynb:738:    "Another way to get around this issue would be to assign a `crs` attribute to `s2_ds.NDWI` before we attempt to extract contours (e.g. `s2_ds.NDWI.attrs['crs'] = s2_ds.crs`)"
Frequently_used_code/Machine_learning_with_ODC.ipynb:780:    "predicted.attrs['crs'] = geometry.CRS(\"EPSG:3577\")\n",
Real_world_examples/Intertidal_elevation.ipynb:863:    "intertidal_dem_ds.attrs['crs'] = landsat_ds.crs\n",
Kirill888 commented 4 years ago

Using unstable image on AU sandbox and this branch I have all notebooks in Frequently_used_code passing

===================================================== test session starts ======================================================
platform linux -- Python 3.6.9, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/jovyan/wk/dea-notebooks/Frequently_used_code
plugins: nbval-0.9.5
collected 292 items                                                                                                            

Analyse_multiple_polygons.ipynb .............                                                                            [  4%]
Animated_timeseries.ipynb .................                                                                              [ 10%]
Applying_WOfS_bitmasking.ipynb .................                                                                         [ 16%]
Calculating_band_indices.ipynb .........                                                                                 [ 19%]
Contour_extraction.ipynb ..............                                                                                  [ 23%]
Exporting_GeoTIFFs.ipynb ..........                                                                                      [ 27%]
Exporting_NetCDFs.ipynb .........                                                                                        [ 30%]
Geomedian_composites.ipynb ........                                                                                      [ 33%]
Image_segmentation.ipynb ...............                                                                                 [ 38%]
Imagery_on_web_map.ipynb ............                                                                                    [ 42%]
Integrating_external_data.ipynb ............                                                                             [ 46%]
Machine_learning_with_ODC.ipynb ....................                                                                     [ 53%]
Masking_data.ipynb ..............                                                                                        [ 58%]
Opening_GeoTIFFs_NetCDFs.ipynb ............                                                                              [ 62%]
Pan_sharpening_Brovey.ipynb .............                                                                                [ 66%]
Rasterize_vectorize.ipynb ...........                                                                                    [ 70%]
Tidal_modelling.ipynb ................                                                                                   [ 76%]
Using_load_ard.ipynb ...............                                                                                     [ 81%]
Virtual_products.ipynb ...........................................                                                       [ 95%]
Working_with_time.ipynb ............                                                                                     [100%]

======================================================= warnings summary =======================================================
/env/lib/python3.6/site-packages/nbval/plugin.py:115: 20 tests with warnings
  /env/lib/python3.6/site-packages/nbval/plugin.py:115: PytestDeprecationWarning: direct construction of IPyNbFile has been deprecated, please use IPyNbFile.from_parent
    return IPyNbFile(path, parent)

/env/lib/python3.6/site-packages/nbval/plugin.py:312: 292 tests with warnings
  /env/lib/python3.6/site-packages/nbval/plugin.py:312: PytestDeprecationWarning: direct construction of IPyNbCell has been deprecated, please use IPyNbCell.from_parent
    cell, options)

Analyse_multiple_polygons.ipynb::Cell 0
  /env/lib/python3.6/site-packages/jupyter_client/manager.py:66: DeprecationWarning: KernelManager._kernel_spec_manager_changed is deprecated in traitlets 4.1: use @observe and @unobserve instead.
    def _kernel_spec_manager_changed(self):

Analyse_multiple_polygons.ipynb: 1 test with warning
Animated_timeseries.ipynb: 1 test with warning
Applying_WOfS_bitmasking.ipynb: 1 test with warning
Calculating_band_indices.ipynb: 1 test with warning
Contour_extraction.ipynb: 1 test with warning
Exporting_GeoTIFFs.ipynb: 1 test with warning
Exporting_NetCDFs.ipynb: 1 test with warning
Geomedian_composites.ipynb: 1 test with warning
Image_segmentation.ipynb: 1 test with warning
Imagery_on_web_map.ipynb: 1 test with warning
Integrating_external_data.ipynb: 1 test with warning
Machine_learning_with_ODC.ipynb: 1 test with warning
Masking_data.ipynb: 1 test with warning
Opening_GeoTIFFs_NetCDFs.ipynb: 1 test with warning
Pan_sharpening_Brovey.ipynb: 1 test with warning
Rasterize_vectorize.ipynb: 1 test with warning
Tidal_modelling.ipynb: 1 test with warning
Using_load_ard.ipynb: 1 test with warning
Virtual_products.ipynb: 1 test with warning
Working_with_time.ipynb: 1 test with warning
  /env/lib/python3.6/site-packages/jupyter_client/manager.py:358: FutureWarning: Method cleanup(connection_file=True) is deprecated, use cleanup_resources(restart=False).
    FutureWarning)

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================== 292 passed, 333 warnings in 1407.62s (0:23:27) ========================================
Kirill888 commented 4 years ago

We should probably combine this work with replacing calls to write_geotiff with write_cog (#586)

cbur24 commented 4 years ago

I didn't see @Kirill888 comment above.....but I agree, this issue dovetails with our need to update write_geotiff examples to write_cog, as many of the instances in the repository of adding crs to attributes (ie ds['attrs'].crs = ds.crs) occurs during the conversion of dataarrays to datasets to satisfy write_geotiff.

Kirill888 commented 4 years ago

write_geotiff should not need crs injection, it accesses that information via .geobox, and .geobox probably survives in those notebooks just fine. Also swapping out write_geotiff does not need to wait for 1.8.1, only 1.8.0, which is already everywhere. Only code that must use assign_crs is dependent on 1.8.1.

Kirill888 commented 4 years ago

Looks like synchronizing everything would be rather difficult, so I have pushed changes to datacube-core that take care of the current state. So we can proceed with updating sandbox images independently of this work. Still I'd rather see these fixes done earlier than later as notebooks are an essential part of documentation for datacube and related libs so keeping them up to date is a priority.

robbibt commented 1 year ago

This will be resolved better by switching to odc.geo tools, will deal with that in more specific issues