US-GHG-Center / veda-config-ghg

Veda config for GHG
https://ghg-demo.netlify.app
Other
3 stars 15 forks source link

Clip/Mask data below threshold value #638

Open siddharth0248 opened 3 weeks ago

siddharth0248 commented 3 weeks ago

Some data providers request clipping data below a specific threshold value so that the colormap starts at this threshold and extends to an upper limit. For example, in TM5 Var, we clipped the values below 0.48 and then set the min-max range in the metadata to 0.48 and 12. If we hadn’t clipped the data, we would have seen many more pixels, and we also want to avoid designating them as "no-data." Currently, my approach involves masking out these values during the data transformation using the provided script. Could this functionality be incorporated into rio-tiler to enhance sustainability?

import rasterio
import numpy as np

url_original = "s3://ghgc-data-store/tm54dvar-ch4flux-monthgrid-v1/methane_emis_total_201601.tif"
url_masked = "s3://ghgc-data-store/tm54dvar-ch4flux-mask-monthgrid-v1/methane_emis_total_201601.tif"

with rasterio.open(url_original) as src:
    metadata = src.meta.copy()
    no_data_value = -9999  
    metadata.update({
        'dtype': 'float32',
        'nodata': no_data_value
    })

    # Create an array to hold the clipped data for all bands
    clipped_data = np.empty((metadata['count'], metadata['height'], metadata['width']), dtype='float32')

    # Iterate through each band
    for i in range(metadata['count']):
        band_data = src.read(i + 1)

        # Create a mask for values below the threshold
        mask = band_data >= threshold

        # Clip pixels and assign no-data value
        clipped_data[i] = np.where(mask, band_data, no_data_value)

    with rasterio.open(output_tiff, 'w', **metadata) as dst:
        dst.write(clipped_data)
siddharth0248 commented 3 weeks ago

@j08lue

j08lue commented 3 weeks ago

Sample data:

j08lue commented 3 weeks ago

So, this is a simple threshold operation - all values below threshold are set to nodata.

The nodata value is also replaced/set (to a hard-coded value of -9999) in the process, though, but it looks like it had that value, at least in this file.

As we discussed, there is still no dedicated API functionality for conditional masking like this, but I expect nonetheless that it can be done with the current API, without modifying the data.

We should explore the use of an expression passed to TiTiler. These are just numexpr strings. Not sure they support where, though. Here is a URL to experiment with: https://earth.gov/ghgcenter/api/raster/cog/preview?url=s3://ghgc-data-store/tm54dvar-ch4flux-monthgrid-v1/methane_emis_total_201601.tif&colormap_name=purd&rescale=0.48%2C24

Alternatively, we could register a custom algorithm in TiTiler that does the masking - i.e. if we created a new algorithm called mask_below, or so, the API call could look like ?algorithm=mask_below&algorithm_params=10

j08lue commented 3 weeks ago

@siddharth0248, can you please confirm

  1. What is the value of threshold for the TM5 data?
  2. Can you please share an unmasked file? Or are files in the tm54dvar-ch4flux-monthgrid-v1 collection unmasked?
j08lue commented 3 weeks ago

Ok, where expressions actually work. Passing where(b1<{threshold}, -9999, b1), e.g. where(b1<10, -9999, b1) does the trick:

j08lue commented 3 weeks ago

We should make sure we instruct users who e.g. load the data into Python or QGIS from the same source also apply this masking. Otherwise, they will get different values e.g. when calculating zonal statistics.

That is the disadvantage of applying the masking on the fly and not fixing the files - slightly worse interoperability.

j08lue commented 1 week ago

To configure this, put this expression into the sourceParams for the dataset, like here:

https://github.com/US-GHG-Center/veda-config-ghg/blob/56574cc3e21723ef290953b6fe91b89bb94240fc/datasets/tm54dvar-ch4flux-monthgrid-v1.data.mdx?plain=1#L62-L67

This block would become

 sourceParams: 
   assets: total 
   colormap_name: purd 
   rescale: 
     - 0.48 
     - 24 
   expression: "where(b1<10, -9999, b1)"