Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

issue with not operator #61

Open clausmichele opened 2 years ago

clausmichele commented 2 years ago

While creating this example for the forum https://discuss.eodc.eu/t/conditional-replacement-e-g-where-condition-true-false/403 I've noticed I was getting a wrong result while using the not operator. Probably it is due to some internal data conversion of the result of the first mask which I later try to invert. Additionally, I'm not sure if it's an issue with the VITO back-end or with SENTINELHUB, so I tag both @jdries and @dthiex here.

An example of this behavior can be reproduced with the following code:

import openeo
from openeo.processes import eq, not_
import xarray as xr

openeoHost = 'https://openeo.cloud'
conn = openeo.connect(openeoHost).authenticate_oidc('egi')

im = conn.load_collection(
        'SENTINEL2_L2A_SENTINELHUB',
        spatial_extent={'west': 11.304588, 'east': 11.377716, 'south': 46.465311, 'north': 46.516839},
        temporal_extent=['2021-06-01', '2021-08-01'],
        bands = ['B04','B08'],
        properties={
            'eo:cloud_cover': lambda x: lte(x, 50)
        })
median_ndvi_june = im.filter_temporal(['2021-06-01', '2021-07-01']).ndvi().reduce_dimension(dimension='t',reducer='median')
median_ndvi_july = im.filter_temporal(['2021-07-01', '2021-08-01']).ndvi().reduce_dimension(dimension='t',reducer='median')

# Check weather the median NDVI is above 0.8 in both months. If yes set the area to 100, else to 50

value_0 = 100
value_1 = 50

# Firstly create a boolean mask for each month
median_ndvi_june_mask = median_ndvi_june > 0.8
median_ndvi_july_mask = median_ndvi_july > 0.8

# Secondly merge the masks with a boolean AND operation
mask_june_and_july_gt_0_8 = median_ndvi_june_mask.merge_cubes(median_ndvi_july_mask,overlap_resolver='and')
mask_june_and_july_lte_0_8 = mask_june_and_july_gt_0_8.apply(lambda x: not_(x))

mask_june_and_july_lte_0_8.download('mask_june_and_july_lte_0_8.nc')

result has negative values:

image

soxofaan commented 2 years ago

What values do you get without that not operation? And what do you get when doing not directly on median_ndvi_june_mask (so without the merge cubes)?

clausmichele commented 2 years ago

Here's the code to do the test you're asking:

import openeo
from openeo.processes import eq, not_
import xarray as xr

openeoHost = 'https://openeo.cloud'
conn = openeo.connect(openeoHost).authenticate_oidc('egi')

im = conn.load_collection(
        'SENTINEL2_L2A_SENTINELHUB',
        spatial_extent={'west': 11.304588, 'east': 11.377716, 'south': 46.465311, 'north': 46.516839},
        temporal_extent=['2021-06-01', '2021-08-01'],
        bands = ['B04','B08'],
        properties={
            'eo:cloud_cover': lambda x: lte(x, 50)
        })
median_ndvi_june = im.filter_temporal(['2021-06-01', '2021-07-01']).ndvi().reduce_dimension(dimension='t',reducer='median')
median_ndvi_july = im.filter_temporal(['2021-07-01', '2021-08-01']).ndvi().reduce_dimension(dimension='t',reducer='median')

# Check weather the median NDVI is above 0.8 in both months. If yes set the area to 100, else to 50

value_0 = 100
value_1 = 50

# Firstly create a boolean mask for each month
median_ndvi_june_mask = median_ndvi_june > 0.8
median_ndvi_july_mask = median_ndvi_july > 0.8

# Secondly merge the masks with a boolean AND operation
mask_june_and_july_gt_0_8 = median_ndvi_june_mask.merge_cubes(median_ndvi_july_mask,overlap_resolver='and')
mask_june_and_july_gt_0_8.download('mask_june_and_july_gt_0_8.nc')

median_ndvi_june_mask_not = median_ndvi_june_mask.apply(lambda x: not_(x))

median_ndvi_june_mask_not.download('median_ndvi_june_mask_not.nc')

So, even using the not before merge_cubes returns negative values:

image

soxofaan commented 2 years ago

probably an issue in openeo-geotrellis-extensions

dthiex commented 2 years ago

I didn't figure out the exact issue but it's not a problem with the data coming from Sentinel Hub.

Here a more minimal example to test/replicate:

import openeo
from openeo.processes import eq, not_, lte, ndvi
import xarray as xr

openeoHost = 'https://openeo.cloud'
conn = openeo.connect(openeoHost).authenticate_oidc('egi')

# load minmal data cube
im = conn.load_collection(
        'SENTINEL2_L2A_SENTINELHUB',
        spatial_extent={'west': 11.35859, 'east':  11.365804, 'south': 46.504182, 'north': 46.507018},
        temporal_extent=['2021-06-02', '2021-06-02'],
        bands = ['B04','B08']
        )

# calculate NDVI
ndvi = im.ndvi()

# Create boolean mask 
ndvi_june_mask = ndvi > 0.8

Until here the data is still correct meaning it contains 0/1s.

# Revert mask
ndvi_june_mask_not = ndvi_june_mask.apply(lambda x: not_(x))

# download ndvi data
output_tiff = "median_ndvi_june_mask_not.tif"
ndvi_june_mask_not.download(output_tiff, format="GTiff")

The problem seems to be indeed in the not_ process. I tested it on all backends and always got a strange results (values of -1/-2).

jdries commented 2 years ago

Problem was that our callback handling generates floats by default, I now made it a bit smarter to also be able to predict type output type for a number of logical operators. The general problem that still needs to be solved is to evaluate the expected target cell type, without performing the actual computations.