TUDelftGeodesy / stmtools

Xarray extension for Space-Time Matrix
https://tudelftgeodesy.github.io/stmtools/
Apache License 2.0
7 stars 0 forks source link

Unit test for duplicated points in an STM #75

Open rogerkuou opened 6 months ago

rogerkuou commented 6 months ago

Issue coming from a discussion in PR #66

A duplicated coordinate (lat, lon, time) will cause the spatial temporal query enrich_from_dataset fail. We need to create a check function to validate there is no duplicated 3D coordinates in an STM.

Also quote Sarah's comments here which are good for consideration:

@rogerkuou to check if the points are unique, the test np.unique(ds['lat'].values).shape == ds['lat'].values.shape is not enough because it only checks the duplicates in one dimension here lat. However, for example, points can be located on one line. Instead, we need a test if there are cases where (lat, lon, time) are duplicated. Functions like xarray.Dataset.drop_duplicates and pandas.DataFrame.duplicated can be used to write a test. But these functions only work on dim and not coords. In our cases, lat and lon are coords and space is the dim. So we might need to use unstack which leads to memory problems.

SarahAlidoost commented 6 months ago

and also this:

As discussed, scipy KDTree works if coordinates (lat, lon, time) are duplicated and the values of variables e.g. temperature are the same too, I added a test for this. If the values of variables are not the same for duplicated coordinates, MacOs and linux behave differently to pick up a value related to the nearest neighbor.

Note that it is a rare case if there are duplicated coordinates with two different values for one variable. However, this cases might happen in data preparations. For example, if coordinates are somehow rounded up.