extract_track bug - Githubissues

DHI / mikeio

Read, write and manipulate dfs0, dfs1, dfs2, dfs3, dfsu and mesh files.

https://dhi.github.io/mikeio

BSD 3-Clause "New" or "Revised" License

138 stars 55 forks source link

extract_track bug #697

Open daniel-caichac-DHI opened 4 months ago

daniel-caichac-DHI commented 4 months ago

Describe the bug Hi, I was facing issues with modelskill track comparisons, so I had to dig deeper (thanks to JAN for the tips), but there seems to be an issue with the .extract_track tool when working with UTM coordinates at least. I ended up running dataextractionfm.exe and that one worked fine, but it breaks the workflow of having a single notebook.

The issue is that it is not finding the paired data, when indeed there is.

To Reproduce I am uploading a zip file with csv file of tracks, a 1-day dfsu with results, and the full workflow (notebook) to reproduce the issue. Bug.zip

System information:

Python version: 3.10.8
MIKE IO version : 1.6.2

ecomodeller commented 4 months ago

The error is caused by the GeometryFM2D.contains method, which incorrectly identifies points as outside the domain.

Here is a breakdown of the boundary segments, where some of the exterior segments seem to be classified as interior.

import matplotlib.pyplot as plt
import pandas as pd

import mikeio

df = pd.read_csv('Altmetry_data_debug.csv',index_col=0,parse_dates=True)
df_utm = df[['easting','northing','significant_wave_height']]
ds = mikeio.Dfsu('Model_subset.dfsu').read()
fig, ax = plt.subplots(figsize=(10,4))
g = ds.geometry

for exterior in g.boundary_polylines.exteriors:
    ax.plot(*exterior.xy.T, color='k', linewidth=1.2)
for interior in g.boundary_polylines.interiors:
    ax.plot(*interior.xy.T, color='r', linewidth=0.5, linestyle='dashed')
plt.scatter(df_utm['easting'],df_utm['northing']);

@jsmariegaard I suppose the detection if a point is inside the domain or not, is handled in a different way in DataTrackExtractionFM.exe?

daniel-caichac-DHI commented 4 months ago

This explains why all the modelskill comparisons where done with data in this part only

ecomodeller commented 4 months ago

If I replace the current implementation with shapely it seems to work as expected. The reason why this approach is not the default, was that is can be quite slow to construct.

def contains(self, points: np.ndarray) -> np.ndarray:
        """test if a list of points are contained by mesh

        Parameters
        ----------
        points : array-like n-by-2
            x,y-coordinates of n points to be tested

        Returns
        -------
        bool array
            True for points inside, False otherwise
        """
        # import matplotlib.path as mp 

        # points = np.atleast_2d(points)

        # exterior = self.boundary_polylines.exteriors[0]
        # cnts = mp.Path(exterior.xy).contains_points(points)

        # if self.boundary_polylines.n_exteriors > 1:
        #     # in case of several dis-joint outer domains
        #     for exterior in self.boundary_polylines.exteriors[1:]:
        #         in_domain = mp.Path(exterior.xy).contains_points(points)
        #         cnts = np.logical_or(cnts, in_domain)

        # # subtract any holes
        # for interior in self.boundary_polylines.interiors:
        #     in_hole = mp.Path(interior.xy).contains_points(points)
        #     cnts = np.logical_and(cnts, ~in_hole)

        # return cnts

        from shapely.geometry import Point

        domain = self.to_shapely().buffer(0)
        return np.array([domain.contains(Point(p)) for p in points])

daniel-caichac-DHI commented 4 months ago

I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains or .inteserction methods implemented

ecomodeller commented 4 months ago

I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains or .inteserction methods implemented

See this branch https://github.com/DHI/mikeio/tree/geometryFM_contains_shapely

ecomodeller commented 4 months ago

This problem is not only relevant for track extraction.

It is a problem for point extraction as well in some cases when using IDW interpolation.

Below is an example of a tide gauge located on an island, where the 5 nearest elements creates a domain, where the contains method fails.

daniel-caichac-DHI commented 4 months ago

But then why don't we:

create a failing test (we can use the files I sent and the example you just showed)
change to the solution with shapely
pass the tests while sacrificing speed, and then
add to the backlog as task of improve_efficiency_contains_points or similar ?

ecomodeller commented 4 months ago

But then why don't we:

create a failing test (we can use the files I sent and the example you just showed)

change to the solution with shapely

pass the tests while sacrificing speed, and then

add to the backlog as task of improve_efficiency_contains_points or similar ?

We will!