cogeotiff / rio-tiler

User friendly Rasterio plugin to read raster datasets.
https://cogeotiff.github.io/rio-tiler/
BSD 3-Clause "New" or "Revised" License
511 stars 106 forks source link

Forwarding `ImageData.array.mask` in NPZ output #685

Closed daisy12321 closed 7 months ago

daisy12321 commented 7 months ago

In this line, https://github.com/cogeotiff/rio-tiler/blob/6343b571a367ef63a10d6807e3d907c3283ebb20/rio_tiler/models.py#L763 should it be using self.array.mask instead of self.mask? My understanding is that it's not the intention to be using uint8 for writing the mask

vincentsarago commented 7 months ago

should it be using self.array.mask

No, ImageData.array.mask is the numpy array boolean mask, and is of shape similar to the data array itself (e.g multiple bands) which is why we use ImageData.mask which is a proper alpha band compatible with rasterio/gdal image encoding

https://github.com/cogeotiff/rio-tiler/blob/6343b571a367ef63a10d6807e3d907c3283ebb20/rio_tiler/models.py#L353-L356

daisy12321 commented 7 months ago

Hmmm, it's true for the rasterio/gdal case, but if we are saving as npz format, there isn't really a required format right? Would it make sense to just use numpy's masked array mask format?

vincentsarago commented 7 months ago

Ah yes, for NPZ we could save the whole numpy mask array. This might complexity the workflow and create some kind of confusion (having multiple mask/alpha types).

Is there any specific reason why a user would want the numpy masked array (ImageData.array.mask) instead of the alpha band (ImageData.mask)?

daisy12321 commented 7 months ago

Our use case is to save to file and read back in as numpy format, effectively doing just np.savez_compressed(f, data=img.array, mask=img.array.mask). I just need the actual data and mask to make sure the downstream calculations is aware of what's masked.

If it adds too much complexity, I can always just call the np.savez_compressed directly - no worries about accommodating this particular use case.