Open daniel-caichac-DHI opened 4 months ago
The error is caused by the GeometryFM2D.contains
method, which incorrectly identifies points as outside the domain.
Here is a breakdown of the boundary segments, where some of the exterior segments seem to be classified as interior.
import matplotlib.pyplot as plt
import pandas as pd
import mikeio
df = pd.read_csv('Altmetry_data_debug.csv',index_col=0,parse_dates=True)
df_utm = df[['easting','northing','significant_wave_height']]
ds = mikeio.Dfsu('Model_subset.dfsu').read()
fig, ax = plt.subplots(figsize=(10,4))
g = ds.geometry
for exterior in g.boundary_polylines.exteriors:
ax.plot(*exterior.xy.T, color='k', linewidth=1.2)
for interior in g.boundary_polylines.interiors:
ax.plot(*interior.xy.T, color='r', linewidth=0.5, linestyle='dashed')
plt.scatter(df_utm['easting'],df_utm['northing']);
@jsmariegaard I suppose the detection if a point is inside the domain or not, is handled in a different way in DataTrackExtractionFM.exe
?
This explains why all the modelskill comparisons where done with data in this part only
If I replace the current implementation with shapely it seems to work as expected. The reason why this approach is not the default, was that is can be quite slow to construct.
def contains(self, points: np.ndarray) -> np.ndarray:
"""test if a list of points are contained by mesh
Parameters
----------
points : array-like n-by-2
x,y-coordinates of n points to be tested
Returns
-------
bool array
True for points inside, False otherwise
"""
# import matplotlib.path as mp
# points = np.atleast_2d(points)
# exterior = self.boundary_polylines.exteriors[0]
# cnts = mp.Path(exterior.xy).contains_points(points)
# if self.boundary_polylines.n_exteriors > 1:
# # in case of several dis-joint outer domains
# for exterior in self.boundary_polylines.exteriors[1:]:
# in_domain = mp.Path(exterior.xy).contains_points(points)
# cnts = np.logical_or(cnts, in_domain)
# # subtract any holes
# for interior in self.boundary_polylines.interiors:
# in_hole = mp.Path(interior.xy).contains_points(points)
# cnts = np.logical_and(cnts, ~in_hole)
# return cnts
from shapely.geometry import Point
domain = self.to_shapely().buffer(0)
return np.array([domain.contains(Point(p)) for p in points])
I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy .contains
or .inteserction
methods implemented
I was thinking yesterday about it, as in, why not use shapely or geopandas or similar (OsGeo or any of those GIS python packages) that already have some fancy
.contains
or.inteserction
methods implemented
See this branch https://github.com/DHI/mikeio/tree/geometryFM_contains_shapely
This problem is not only relevant for track extraction.
It is a problem for point extraction as well in some cases when using IDW interpolation.
Below is an example of a tide gauge located on an island, where the 5 nearest elements creates a domain, where the contains method fails.
But then why don't we:
improve_efficiency_contains_points
or similar
?But then why don't we:
- create a failing test (we can use the files I sent and the example you just showed)
- change to the solution with shapely
- pass the tests while sacrificing speed, and then
- add to the backlog as task of
improve_efficiency_contains_points
or similar ?
We will!
Describe the bug Hi, I was facing issues with modelskill track comparisons, so I had to dig deeper (thanks to JAN for the tips), but there seems to be an issue with the
.extract_track
tool when working with UTM coordinates at least. I ended up runningdataextractionfm.exe
and that one worked fine, but it breaks the workflow of having a single notebook.The issue is that it is not finding the paired data, when indeed there is.
To Reproduce I am uploading a zip file with csv file of tracks, a 1-day dfsu with results, and the full workflow (notebook) to reproduce the issue. Bug.zip
System information: