GeoscienceAustralia / dea-notebooks

Repository for Digital Earth Australia Jupyter Notebooks: tools and workflows for geospatial analysis with Open Data Cube and Xarray
https://docs.dea.ga.gov.au/notebooks/
Apache License 2.0
430 stars 126 forks source link

Suggested feature: 1m global canopy height product (AWS interoperability) #1232

Open wwoodgate opened 3 weeks ago

wwoodgate commented 3 weeks ago

Hi DEA team,

Building on the fantastic MS Planetary Computer notebook functionality that enables users to gain access to the MS Data Catalogue, I was wondering if similar functionality could be added to load in Open (spatial) Data from AWS? I'm specifically thinking of the recent 1m High Resolution (global) Canopy Height Maps by WRI and Meta (https://registry.opendata.aws/dataforgood-fb-forests/).

I had a poke around the files but couldn't see an easy way to do a spatial query (on say a specified bbox).

Just a suggestion!

Cheers

robbibt commented 3 weeks ago

Hey @wwoodgate! Yep, it would be really neat to provide easier access to that data - it's pretty incredible stuff! Unfortunately a challenge is that the data doesn't currently have STAC metadata or a STAC search API, which means that the approach used in the Planetary Computer notebook (using Planetery Computer's STAC API to find and load the data for an AOI) doesn't directly apply.

It would be possible to develop a funky method that uses Meta's custom index file as a lookup - e.g. work out what file paths to load for an AOI. It would be a bit more bespoke, but probably pretty useful given how enormous this data is! Will have a think - and please feel free to share anything you put together yourself (we could try and get it into a notebook here)!

robbibt commented 3 weeks ago

@wwoodgate Here's a possible approach - use geopandas to load the index tile file for a bounding box, then load the corresponding tile and clip to your AOI using odc.geo:

import os
import rioxarray
import geopandas as gpd
import odc.geo.xr
from odc.geo.geom import BoundingBox

# Allow anonymous access to Meta canopy height data on Amazon S3
os.environ["AWS_NO_SIGN_REQUEST"] = "YES"

# Set study area
bbox = BoundingBox.from_xy(
    x=(152.436658, 152.453911),
    y=(-32.387423, -32.402873),
    crs="EPSG:4326",
)

# Read index tiles within bbox
tiles_gdf = gpd.read_file(
    "s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/tiles.geojson",
    bbox=bbox,
)

# Load specific tile
tile_id = tiles_gdf.tile[0]
ds = rioxarray.open_rasterio(f"s3://dataforgood-fb-data/forests/v1/alsgedi_global_v6_float/chm/{tile_id}.tif").squeeze()

# Crop to extent of bbox
ds_cropped = ds.odc.crop(bbox.polygon)

# Set zero values to NaN for visualisation
ds_cropped = ds_cropped.where(ds_cropped > 0)

# Export to file
ds_cropped.odc.write_cog(f"meta_chm_{tide_id}.tif")

# Explore on map
ds_cropped.odc.explore(cmap="Greens", vmin=0, vmax=15)

image

wwoodgate commented 2 weeks ago

Looks great! Thanks very much @robbibt

One minor hiccup was an error was thrown when reading in the gpd. The error was "CPLE_AWSAccessDeniedError: Access Denied" so perhaps something to do with the s3 mount?

I changed the path to ""https://dataforgood-fb-data.s3.amazonaws.com/forests/v1/alsgedi_global_v6_float/tiles.geojson" and it worked nicely.

Thanks a bunch for this!!

robbibt commented 2 weeks ago

Setting this should have in theory given you access:

os.environ["AWS_NO_SIGN_REQUEST"] = "YES"

But yep, access via HTTPS should work as a workaround.

I was messing around with a notebook example for this (comparing cover from DEA Fractional Cover with height from CHM), so will post it here when ready!

wwoodgate commented 2 weeks ago

Yup adding that fixes the issue.

Looking forward to the cover vs height comparison!