WFP-VAM / prism-app

PRISM is an interactive map-based dashboard that simplifies the integration of geospatial data on hazards, along with information on socioeconomic vulnerability
MIT License
44 stars 33 forks source link

[Bug]: Analysis results failing for layers with scale factors #1298

Open wadhwamatic opened 3 days ago

wadhwamatic commented 3 days ago

What happened?

This issue surfaced when testing PR https://github.com/WFP-VAM/prism-app/pull/1297 though is not related to the PR.

If I run analysis for a layer with a scale factor set using wcsConfig in layers.json, I get three types of errors: either all values are zero, null, or are equal to the offset value. I tested this in Mozambique and RBD and see the same issue in both cases

These are the errors per layer (see screenshots in the next section)

Steps to reproduce:

  1. REACT_APP_COUNTRY=mozambique yarn start
  2. Run analysis with any of the above hazard layers, lowest level admin boundary as baseline (admin 3 for Mozambique), default for all others
  3. View table to see values for each admin area

Which country / deployment are you running?

main / mozambique & RBD

Add a screenshot (if relevant)

NDVI (all zeroes):

Screenshot 2024-07-04 at 14 48 51

LST (all values = offset)

Screenshot 2024-07-04 at 15 00 57

LST anomaly (null values):

Screenshot 2024-07-04 at 14 53 52

LST amplitude (all zeroes):

Screenshot 2024-07-04 at 14 53 00
ericboucher commented 3 days ago

@wadhwamatic I investigated the issue and indeed it looks like the problem is in the backend. More specifically, this has been introduced by the switch to STAC.

However I am not exactly sure what is happening. It could be a resolution issue or something to do with no_data representations. The CRS is also missing and should probably be set to EPSG:4326 - WGS 84

Here are a few files for testing for devs which can be used with the following parameters:

{"geotiff_url":"{working or not working file}","zones_url":"https://prism-admin-boundaries.s3.us-east-2.amazonaws.com/moz_bnd_adm2_WFP.json","group_by":"adm2_source_id","geojson_out":false}'

working geotiff: https://api.earthobservation.vam.wfp.org/ows/?bbox=30.2%2C-26.9%2C40.8%2C-10.5&coverage=mxd13a2_viq_dekad&crs=EPSG%3A4326&format=GeoTIFF&height=2721&request=GetCoverage&service=WCS&time=2024-06-11&version=1.0.0&width=1763

Archive.zip

wadhwamatic commented 3 days ago

Thanks @ericboucher. Looping in @valpesendorfer to help us troubleshoot this issue.

Valentin - production of zonal stats is failing on a few layers. Eric has found a couple of issues that arose after the switch to STAC. The layers causing issues are:

Could you take a look and let's try to sync early next week?

ericboucher commented 11 hours ago

@wadhwamatic @valpesendorfer nota bene - on our end it does not seem to be an issue with the "scale factor" per say since the scaling is happening in the frontend. But maybe there is something different happening in the processing of these layers on the STAC side?

valpesendorfer commented 7 hours ago

Sorry I didn't have time yet to dive into this. But just to clarify - there's no processing happening on STAC side. The STAC item is just a metadata construct with the link to the actual file on S3 (scaled if applicable) and the corresponding metadata (collection, extent, nodata & scale factor if applicable).

So by switching to STAC either you were relying on something which happened automatically through WCS, or the change in the logic introduced a bug.

valpesendorfer commented 1 hour ago

Ok, so I had a look and I know what's the issue:

The STAC API just serves the metadata information for HDC's STAC collections and their items, and that's it. All processing tasks are a responsibility of the client. When you switched over to STAC from OWS, you assumed all of these responsibilities which were partially taken over at that point by the OWS server.

In particular, this is about the reprojection - all the datasets above are not in epsg:4326, but rather in the native MODIS projection. Previously, you requested the TIFF from WCS and the server did the reprojection for you. Now switching to STAC, the client needs to take care of reprojection.

If you look at the two example rasters above (working & not working) you see they both have data - just one is in epsg:4326 and one not. What I'm assuming is happening in the analysis step is that you're trying to extract values with an epsg:4326 vector layer, which does not find any values at these locations, returning zeros/null/whatever.

Two proposals for a solution:

TBH, the second option is the cleaner, more flexible and by different measures "more correct" one.