DHI / terracotta

A light-weight, versatile XYZ tile server, built with Flask and Rasterio :earth_africa:
https://terracotta-python.readthedocs.org
MIT License
677 stars 71 forks source link

Figure out how to handle categorical data #44

Closed dionhaefner closed 6 years ago

dionhaefner commented 6 years ago

Challenges:

j08lue commented 6 years ago

Another thing:

mads-gras commented 6 years ago

this will be needed rather soon - the first version of terracotta supported this.

I can provide a case we can work on ;)

dionhaefner commented 6 years ago

Actually quite a challenging problem. This will require another lengthy API planning session. Looking forward to it :wink:

j08lue commented 6 years ago

This will require another lengthy API planning session. Looking forward to it 😉

Yes, you and @mrpgraae go into conclave and show some smoke when you found out...

mrpgraae commented 6 years ago

The Terracotta Council of Elders concludes... ☁️ ☁️ ☁️

There is no good way to implement categorical datasets in a way that Terracotta is agnostic about them. We will have to implement special cases and features for categorical datasets.

Split /legend into /legend and /colormap

/legend should be renamed to /colormap, since that is more descriptive of what the call actually returns. A call to /legend/{keys} shall henceforth return the names of the categories in a categorical dataset and their associated hex color string. Calling /legend on a non-categorical dataset returns empty dict.

New parameter for driver.insert

Add a new parameter called categories which should be a list of Category named tuples (could be dataclasses in the future). The Category named tuple has 3 attributes:

A new column Categories will be added to the database. The value will be a VARCHAR containing a JSON encoding of the categories. For non-categorical datasets, this column will be null. The presence of this defines whether or not a dataset is categorical.

The ugly part

We will need to add branches in the low-level functions to handle the categorical case:

Bonus features

When terracotta optimize-rasters is used to cloud-optimize a raster, we should set a GeoTIFF tag specifying what resampling method was used for the overviews. We can then warn the user if they are trying to add a dataset as categorical, when they used something other than nearest as resampling method.

We could allow users to not specify colors for the categories and then auto-generate a nice color cycle for them. This could be done with something like an np.linspace index into the Viridis colormap.

dionhaefner commented 6 years ago

Thoughts:

dionhaefner commented 6 years ago

More problems:

dionhaefner commented 6 years ago

Category-agnostic Terracotta

Recipe to create categorical datasets:

  1. Create keys [type, sensor, date, band]
  2. Ingest categorical data with type=categorical, and other data with type=index or type=reflectance or whatever
  3. During ingestion, add category mapping to extra_metadata, in the form of {category: pixel_value}
  4. In the frontend, get all categorical datasets via /datasets?type=categorical
  5. Get categories via /metadata (includes ingested extra_metadata)
  6. Get imagery via /singleband/categorical/S2/20180820/classification/{z}/{x}/{y}.png?colormap={pixel_value: color, ...} (supplying mapping like this suppresses stretching and uses nearest resampling)

Pros

Cons


Whether we should go for this or not depends on how explicit we want to be in supporting categorical data. Is it a niche use case or a core feature? Can we afford to annoy the users a little with this somewhat hacky recipe?

dionhaefner commented 6 years ago

Implemented. We'll see how this recipe works in practice. If it proves to be too cumbersome we can still introduce explicit support for categories by supplying them directly to driver.ingest.