developmentseed / titiler-cmr

Dynamic tiles from CMR queries
MIT License
6 stars 0 forks source link

Handle case where each variable is represented as a separate granule #32

Open hrodmn opened 1 month ago

hrodmn commented 1 month ago

Right now titiler-cmr can handle the case where granules are defined by a distinct time point and where each granule has the same set of variables. There many datasets in CMR where each granule represents the same timestep but have a different variable. For example, the Regridded Harmonized World Soil Database v1.2 dataset has 27 granules that each contain estimates of a different soil property.

To handle this case we could use the bands_regex parameter for the xarray backend case that would filter the granule results down to one that matches the regex. We would need to change the format of the mosaic_assets in this case since the ZarrReader can't handle the dictionary of {band: url} keys get_assets produces when you provide band_regex.

I hacked a solution together just to see if it is feasible, here are some tiles from the soil PH layer of that dataset: image

$ git diff titiler/.
diff --git a/titiler/cmr/backend.py b/titiler/cmr/backend.py
index 50307e7..1d2a676 100644
--- a/titiler/cmr/backend.py
+++ b/titiler/cmr/backend.py
@@ -231,12 +231,21 @@ class CMRBackend(BaseBackend):
             access=s3_auth_config.access,
             bands_regex=bands_regex,
         )
-
         if not mosaic_assets:
             raise NoAssetFoundError(
                 f"No assets found for tile {tile_z}-{tile_x}-{tile_y}"
             )

+        # reformat the mosaic_assets to match expectation for xarray backend
+        # would only want to do this for the backend="xarray" case...
+        if bands_regex:
+            asset = mosaic_assets[0]
+            if len(asset) > 1:
+                raise ValueError("bands_regex returned multiple assets!")
+            url = list(asset["url"].values())[0]
+            provider = asset["provider"]
+            mosaic_assets = [{"url": url, "provider": provider}]
+
         def _reader(asset: Asset, x: int, y: int, z: int, **kwargs: Any) -> ImageData:
             if (
                 s3_auth_config.strategy == "environment"
diff --git a/titiler/cmr/factory.py b/titiler/cmr/factory.py
index e10d3af..8e7b73d 100644
--- a/titiler/cmr/factory.py
+++ b/titiler/cmr/factory.py
@@ -121,7 +121,9 @@ def parse_reader_options(

     if reader_params.backend == "xarray":
         reader = ZarrReader
-        read_options = {}
+        read_options = {
+            "bands_regex": rasterio_params.bands_regex,
+        }

         options = {
             "variable": zarr_params.variable,
vincentsarago commented 1 month ago

🤯 there are too many ways to handle

First, I think we should rename bands_regex -> assets_regex

I think I lost tracks but for xarray dataset we need a Variable= option, right? I'm not sure why we need to pass bands_regex to the reader (with read_options), the variable should be one asset from the list of assets returned

hrodmn commented 1 month ago

The mixed xarray/rasterio logic is starting to get a bit messy with conditional checks in the single CMRBackend class. Maybe we are at the point where it would be cleaner to have several backends: CMRRasterioBackend and CMRXarrayBackend. There could be some shared utility functions but this structure might make it easier to do the right thing for each of these cases.