LuiseQuoss / ebvcube

Accessing, visualising and creating EBV netCDF datasets. Download datasets from the EBV Data Portal: https://portal.geobon.org/
GNU General Public License v3.0
4 stars 2 forks source link

How does the package process raster metadata (scale, offset, nodata) when passed to ebv_add? #42

Open bettasimousss opened 2 months ago

bettasimousss commented 2 months ago

Hello,

I have a set of raster datasets in geotiff format, they are stored as UINT16 (scale = 10000, offset = 0) with nodata set to 65535. After setting up the metadata on the portal, downloading the json I create and populate the netCDF file with these rasters as follows:

ebv_create_taxonomy(jsonpath = metadata_json, outputpath = newNc, taxonomy = taxo_path,sep = ',', lsid = FALSE, epsg = 3035, resolution = c(1000,1000), prec = 'integer', fillvalue = 65535, extent = c(723000.0000000000000000,7700000.0000000000000000,160000.0000000000000000,6615000.0000000000000000), overwrite=T, verbose=FALSE)

At this point, visualizing the data in panoply the arrays are filled with 65535 as expected. Now, I wonder if I should add the data layers as path to the tiff and in this case it seems ebvcube is taking them as float and overriding the NODATA value with the default one for float, and setting all valid values to 0.

Or is it preferable to add the data as arrays, providing the values as integers (not applying scale). In this case, how to setup the scale so that the visualization in the portal is still on the [0,1] range ?

Thanks in advance for your support !

LuiseQuoss commented 1 month ago

Hello Sara,

thanks for your feedback and questions. We currently have not yet implemented scale and offset. There are two approaches possible for you:

  1. Pass your data as float values and do the same with the NoData value. For this you need to read the data from the Tiffs and pass the data as arrays with the values converted to floats to ebv_add_data().
  2. Pass the integer values directly from the Tiff files to ebv_add_data() without converting to floats. Clearly define the units in the netCDF metadata to something like “Continuous 0-1 Score (x10000)” to inform users that values need to be divided by 10000 to obtain scores in the 0-1 range. This dataset shows a similar implementation. Option 2 seems the most suitable given that you are dealing with a very large datasets.

We have plans to implement the scale and offset for the EBVCube netCDFs. However, this will only happen in the mid-term.

I started investigating the overwriting of the NoData value. Thanks for pointing it out!