Closed Kirill888 closed 1 year ago
Needed changes look something like this https://github.com/GeoscienceAustralia/dea-notebooks/commit/39c0532eebf1928a4e4cebe2c34210522ffc2783, except that there are more places
Scripts/notebookapp_miningrehab.py:85: ds_resampled.attrs["crs"] = dataset_fc.crs
Scripts/notebookapp_changefilmstrips.py:214: ds_geomedian.attrs['crs'] = ds.crs
Frequently_used_code/Contour_extraction.ipynb:738: "Another way to get around this issue would be to assign a `crs` attribute to `s2_ds.NDWI` before we attempt to extract contours (e.g. `s2_ds.NDWI.attrs['crs'] = s2_ds.crs`)"
Frequently_used_code/Machine_learning_with_ODC.ipynb:780: "predicted.attrs['crs'] = geometry.CRS(\"EPSG:3577\")\n",
Real_world_examples/Intertidal_elevation.ipynb:863: "intertidal_dem_ds.attrs['crs'] = landsat_ds.crs\n",
Using unstable
image on AU sandbox and this branch I have all notebooks in Frequently_used_code
passing
===================================================== test session starts ======================================================
platform linux -- Python 3.6.9, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/jovyan/wk/dea-notebooks/Frequently_used_code
plugins: nbval-0.9.5
collected 292 items
Analyse_multiple_polygons.ipynb ............. [ 4%]
Animated_timeseries.ipynb ................. [ 10%]
Applying_WOfS_bitmasking.ipynb ................. [ 16%]
Calculating_band_indices.ipynb ......... [ 19%]
Contour_extraction.ipynb .............. [ 23%]
Exporting_GeoTIFFs.ipynb .......... [ 27%]
Exporting_NetCDFs.ipynb ......... [ 30%]
Geomedian_composites.ipynb ........ [ 33%]
Image_segmentation.ipynb ............... [ 38%]
Imagery_on_web_map.ipynb ............ [ 42%]
Integrating_external_data.ipynb ............ [ 46%]
Machine_learning_with_ODC.ipynb .................... [ 53%]
Masking_data.ipynb .............. [ 58%]
Opening_GeoTIFFs_NetCDFs.ipynb ............ [ 62%]
Pan_sharpening_Brovey.ipynb ............. [ 66%]
Rasterize_vectorize.ipynb ........... [ 70%]
Tidal_modelling.ipynb ................ [ 76%]
Using_load_ard.ipynb ............... [ 81%]
Virtual_products.ipynb ........................................... [ 95%]
Working_with_time.ipynb ............ [100%]
======================================================= warnings summary =======================================================
/env/lib/python3.6/site-packages/nbval/plugin.py:115: 20 tests with warnings
/env/lib/python3.6/site-packages/nbval/plugin.py:115: PytestDeprecationWarning: direct construction of IPyNbFile has been deprecated, please use IPyNbFile.from_parent
return IPyNbFile(path, parent)
/env/lib/python3.6/site-packages/nbval/plugin.py:312: 292 tests with warnings
/env/lib/python3.6/site-packages/nbval/plugin.py:312: PytestDeprecationWarning: direct construction of IPyNbCell has been deprecated, please use IPyNbCell.from_parent
cell, options)
Analyse_multiple_polygons.ipynb::Cell 0
/env/lib/python3.6/site-packages/jupyter_client/manager.py:66: DeprecationWarning: KernelManager._kernel_spec_manager_changed is deprecated in traitlets 4.1: use @observe and @unobserve instead.
def _kernel_spec_manager_changed(self):
Analyse_multiple_polygons.ipynb: 1 test with warning
Animated_timeseries.ipynb: 1 test with warning
Applying_WOfS_bitmasking.ipynb: 1 test with warning
Calculating_band_indices.ipynb: 1 test with warning
Contour_extraction.ipynb: 1 test with warning
Exporting_GeoTIFFs.ipynb: 1 test with warning
Exporting_NetCDFs.ipynb: 1 test with warning
Geomedian_composites.ipynb: 1 test with warning
Image_segmentation.ipynb: 1 test with warning
Imagery_on_web_map.ipynb: 1 test with warning
Integrating_external_data.ipynb: 1 test with warning
Machine_learning_with_ODC.ipynb: 1 test with warning
Masking_data.ipynb: 1 test with warning
Opening_GeoTIFFs_NetCDFs.ipynb: 1 test with warning
Pan_sharpening_Brovey.ipynb: 1 test with warning
Rasterize_vectorize.ipynb: 1 test with warning
Tidal_modelling.ipynb: 1 test with warning
Using_load_ard.ipynb: 1 test with warning
Virtual_products.ipynb: 1 test with warning
Working_with_time.ipynb: 1 test with warning
/env/lib/python3.6/site-packages/jupyter_client/manager.py:358: FutureWarning: Method cleanup(connection_file=True) is deprecated, use cleanup_resources(restart=False).
FutureWarning)
-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================================== 292 passed, 333 warnings in 1407.62s (0:23:27) ========================================
We should probably combine this work with replacing calls to write_geotiff
with write_cog
(#586)
I didn't see @Kirill888 comment above.....but I agree, this issue dovetails with our need to update write_geotiff
examples to write_cog
, as many of the instances in the repository of adding crs
to attributes (ie ds['attrs'].crs = ds.crs
) occurs during the conversion of dataarrays to datasets to satisfy write_geotiff
.
write_geotiff
should not need crs
injection, it accesses that information via .geobox
, and .geobox
probably survives in those notebooks just fine. Also swapping out write_geotiff
does not need to wait for 1.8.1
, only 1.8.0
, which is already everywhere. Only code that must use assign_crs
is dependent on 1.8.1
.
Looks like synchronizing everything would be rather difficult, so I have pushed changes to datacube-core that take care of the current state. So we can proceed with updating sandbox images independently of this work. Still I'd rather see these fixes done earlier than later as notebooks are an essential part of documentation for datacube and related libs so keeping them up to date is a priority.
This will be resolved better by switching to odc.geo
tools, will deal with that in more specific issues
History
The way CRS information is stored and accessed in datacube has changed over the years. Those changes were driven primarily by two factors: better interoperability with xarray/rasterio and other geospatial libraries and better "handling of various xarray operations".
Initially CRS information was stored as a datacube
CRS
class in an attribute namedcrs
on each datavariable and on dataset object returned fromdc.load(..)
. This approach was problematic as it prevented methods likexarray.to_netcdf(..)
from working (didn't know what to do withcrs
attribute that is not a string). It also suffered from "missing attribute" problem, some common xarray operations like type casting could dropcrs
attribute.Datacube code was then changed for a brief period to store
crs
attribute on thex,y
coordinates, and to be a string and not aCRS
class, this allowed for better "survivability" of CRS data across per-pixel operations, as coordinates are essentially copied from source to destination including all the attributes attached to them. Turningcrs
into a string allowed for ability to save data withxarray.to_netcdf
method.Then we had a proposal to change representation yet again to be inline with what NetCDF CF convention is using. This had a lot of merit and provided with cleaner solution than what we had at that point in time, so we went along and implemented that as part of 1.8.0 release.
Current State
spatial_ref
.geobox
"dynamic property", available after importingdatacube
.geobox
can also detect CRS information as returned byxarray.open_(dataset|rasterio)
.crs
property is no longer populated withCRS
class for data returned bydc.load
, one must use.geobox.crs
.crs
property is populated byxarray.open_rasterio
and is set to a string representation of the CRS, usually WKT or EPSG code.geobox
should returnNone
, if it raises an Error of any sort, this should be considered a bug and fixed..geobox
survives round trip to netcdf and back1.8.1
, there is nowdatacube.utils.geometry.assign_crs
method. This is a preferred mechanism to add CRS to unregistered data and to convert simple attribute based representation to a more robust coordinates based representation.Changes Required to Notebooks
xx.crs
, instead accessxx.geobox.crs
, addassert xx.geobox is not None
to indicate to people reading code and to check at run-time that crs information is required/availablecrs
attribute directly into dataset/dataarray. You probably don't need to do it anymore, it probably survived computation just fine, or has been detected if data was returned from one of xarray load functionsxarray
methods, useassign_crs
to convert to Datacube format that survives per pixel operations betterxx.geobox.dims
, this does not work for external datasets inESPG:4326
loaded via xarray open functions (it names coordinates asx,y
always), usedatacube.utils.spatial_dims
instead.Those changes are difficult to make in a way that will work with older and newer versions of datacube, so I propose we just change to support newest version only
1.8.1
and forward. Probably1.8.2
that is yet to be released as there has been some more fixes to.geobox
handling for external sources since1.8.1
.I have been able to run following with current datacube code after some changes to notebooks and to
xr_rasterize
functionbut there are way more notebooks than these...
The tricky part is synchronising updates to sandbox/dea on NCI/notebooks repos.