Optimize calls to visualize bounding boxes from mosaics

abarciauskas-bgse commented 1 year ago

To create a histogram from a mosaic, I am using the titiler mosaic assets endpoint to discover which assets fall into the current bounding box and then use those to clip and combine datasets: https://github.com/developmentseed/stac_ipyleaflet/blob/main/stac_ipyleaflet/core.py#L127

Is there a better way to do this? @wildintellect mentioned that titiler supports returning numpy arrays and we were wondering if it would be possible to call the mosaics endpoint with the bounding box and get back a numpy array. This looks possible via https://titiler.maap-project.org/docs#/SpatioTemporal%20Asset%20Catalog/tile_stac_tiles__TileMatrixSetId___z___x___y___format__get using npy as the format, but what should XYZ be? It doesn't look there is anyway to request the data just for a bounding box either. If I had to guess, I would imagine that we need first determine which (numpy) tiles to request based on the bounding box (giving us the x,y) and then clip those tiles to the bounding box. And then the zoom may depend on what level of accuracy we require for the histogram.

@vincentsarago any suggestions?

vincentsarago commented 1 year ago

@abarciauskas-bgse 👋

You could use rio-tiler Mosaic feature to fetch all image and then merge them into one image and then get the statistics from it 👇

import numpy
from rio_tiler.io import Reader
from rio_tiler.mosaic import mosaic_reader
from rio_tiler.models import ImageData

    mosaic_url = match.groups()[0]
    str_bounds = f"{bounds[0]},{bounds[1]},{bounds[2]},{bounds[3]}"
    assets_endpoint = f"{titiler_url}/mosaicjson/{str_bounds}/assets?url={mosaic_url}/mosaicjson"
    # create a dataset from multiple COGs?
    assets_response = requests.get(assets_endpoint)
    datasets = []

    assets = assets_response.json()

    # see https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/io/rasterio.py#L368-L380
    def _part_read(src_path: str, *args, **kwargs) -> ImageData:
        with Reader(src_path) as src:
            return src.part(bbox, *args, **kwargs)

    # mosaic_reader will use multithreading to distribute the image fetching 
    # and them merge all arrays together
    img, _ = mosaic_reader(assets, bounds, max_size=512). # change the max_size to make it faster/slower 

    data = img.as_masked()  # create Masked Array from ImageData
    # Avoid non masked nan/inf values
    numpy.ma.fix_invalid(data, copy=False)

    hist = {}
    for ii, b in enumerate(img.count):
        h_counts, h_keys = numpy.histogram(data[b].compressed())
        hist[f"b{ii + 1}"] = [h_counts.tolist(), h_keys.tolist()]

☝️ This will not work if the image do not have the same resolution (because we won't be able to overlay them). if you know the resolution you want to use you can use width=.., height=.. instead of max_size=512 (it will ensure you create the same array size for all the images.

I'm also assuming that the assets are COG and not STAC Items (if not then you have to use STACReader and pass a list of stac assets to read.

@wildintellect mentioned that titiler supports returning numpy arrays and we were wondering if it would be possible to call the mosaics endpoint with the bounding box and get back a numpy array.

Yes, TiTiler can return numpy array https://developmentseed.org/titiler/examples/notebooks/Working_with_NumpyTile/

You can use the /crop endpoint for each file https://titiler.maap-project.org/docs#/SpatioTemporal%20Asset%20Catalog/part_stac_crop__minx___miny___maxx___maxy___format__get

    for asset in assets_response.json():
        try:
            crop_endpoint = f"{titiler_url}/cog/crop/{str_bounds}.npy?url={asset}&max_size=512"  # Same here you can either use max_size or width&height
            res = requests.get(crop_endpoint)
            arr = numpy.load(BytesIO(r.content))
            tile, mask = arr[0:-1], arr[-1]

            # TODO:
            # convert tile/mask to xarray dataset
            datasets.happen(ds)
        except:
            pass

        ds = xr.combine_by_coords(datasets, fill_value=datasets[0]._FillValue)

abarciauskas-bgse commented 1 year ago

Thanks @vincentsarago will take a closer look at this solution.

MAAP-Project / stac_ipyleaflet

Optimize calls to visualize bounding boxes from mosaics #4