gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
248 stars 49 forks source link

Black stripes on mosaicking #217

Closed lhcorralo closed 1 year ago

lhcorralo commented 1 year ago

Hi!

Description

I am running some tests with stackstac, and I have an issue regarding the mosaicking: some stripes appear. Note that I did the tests in a Jupyter notebook, despite I guess it will not be any difference

Code

The code is


from dask.distributed import Client, LocalCluster
from pystac import ItemCollection
from pystac_client import Client
import stackstac
import numpy as np
from matplotlib import pyplot as plt

# Init Dask
cluster = LocalCluster(processes=False, n_workers=2, threads_per_worker=2, memory_limit='8GiB')
client = Client(cluster)

# Search parameters
bbox = [-3.5, 40.0, -1.0, 41.5]

# Get items
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
search = catalog.search(collections=["cop-dem-glo-30"], bbox=bbox)
items_pc: ItemCollection = search.item_collection()

# Stack and mosaic
stack = stackstac.stack(items_pc)
mosaic = stackstac.mosaic(stack)

# plot
fig = plt.figure()
plt.imshow(mosaic[0].compute())

and the result is as appears in mosaic

As the files are DEM files, mosaicking results should be similar using another functions, so I tried max in order to evaluate if the problem was regarding stacking/input sources or mosaicking.

mosaic = stack.max(dim='time')

# plot
fig = plt.figure()
plt.imshow(mosaic[0].compute())

In this case, the operation seems to give good results

max

Let me know if I can provide more info

Thanks!

gjoseph92 commented 1 year ago

Thanks for the good report @lhcorralo. It looks like the underlying issue is https://github.com/rasterio/rasterio/issues/2916.

Short explanation: the xarray is basically composed of multiple tiles, spatially. It's also reading from multiple items, which cover different areas. When a tile doesn't overlap with an item at all, stackstac has an optimization to return all NaNs for the tile. But when the tile overlaps with an item, we use rasterio to read it. It looks like in this particular dataset, the GeoTIFFs don't have a nodata value set, so GDAL is filling in 0s for the pixels outside the bounds of the item, but not actually telling us that they should be masked out. However, the mosaic operation is only treating NaNs as nodata, so the 0s are left in, resulting in the stripes. That's also why max works: any actual data value is greater than 0.

I'll push up a fix for this, but for now, you can probably work around this with:

stack = stackstac.stack(items_pc, fill_value=0)
mosaic = stackstac.mosaic(stack, nodata=0)

That will make stackstac use 0s for empty tiles, matching up with GDAL's nodata value. Then you mosaic out all the 0s.

As a sidenote, this is yet another reason to move away from VRTs, which already needs to happen: https://github.com/gjoseph92/stackstac/issues/196.

lhcorralo commented 1 year ago

Thanks!

Yes, it works as a workaround: it is easy to get a fill_value/nodata for elevation values (I have seen -9999 sometimes), so no problem.

gjoseph92 commented 1 year ago

FYI @lhcorralo, the fix is now released in 0.5.0, so you should be able to go back to using mosaic. Thanks for the great bug report!