Open kmuehlbauer opened 1 year ago
I don't really like the idea in principle as I think it is the responsibility of the library to contour the data as given, not provide a filtered/sanitised version of the contours.
There is a practical consideration too. As you have established, it is possible to set a minimum line length for contour lines and for filled polygons that do not intersect the domain boundary (or masked quads) as you can simply follow those contours as normal and then just not report them to the user. But for filled polygons that do intersect the boundary you cannot do this. You have to apply your minimum line length to each of the potentially multiple line segments that cross the interior of the domain otherwise they won't sensibly meet their neighbouring contoured polygons. You cannot discard a whole polygon because one of the line segments is too short, you need to backtrack to the last intersection with the boundary and follow that boundary instead. This breaks a fundamental assumption of all current contourpy algorithms that each grid point can be categorised as above or below the contour levels before tracing contours. With the minimum line length idea this information needs to be overridden depending on context. This has quite an impact on the implementation and would cause a significant reduction in performance for all users of the library, so I would not modify the existing algorithms in this way.
It would be possible to write a new algorithm to do this, perhaps with significant reuse of the existing serial
algorithm, which could reside in contourpy
or outside. But I am not personally interested in writing it.
Having disappointed you, how can I help?
If you only wanted to filter out short lines and leave the filled polygons as they are, i.e. just solve the easy part of the problem, then one could write a really simple class that inherits from SerialContourGenerator
and overrides the lines
and create_contour
methods to call those of the base class, but then you can filter out what is returned and pass that on to the caller. This new class could be registered with contourpy
as a valid contour generator so that you could specify it by a string name. Then, with some minor changes to matplotlib
you'd be able to call your contour generator in the usual way. I haven't fully worked out what the API should be for this registering of a new contour generator but I am working on it; I have something that works today but it is experimental and is very likely to change.
If you wanted to modify your filled contours via a similar approach then you would need to know which of the points in each filled contour polygon are on a boundary and which are in the domain interior. This would need one or more new FillType
possibilities to return extra arrays of booleans. This would give you enough information to work out which lines to remove, but you would have the problem of replacing them with points following the domain boundary, so this is much harder than the contour lines case.
Thanks Ian for the detailed explanation and reasoning.
Having disappointed you, how can I help?
No, I'm not disappointed, quite the contrary. I'm happy to follow your suggestion with the SerialContourGenerator
. Would it be possible to keep that on the Python side of things?
I also have a solution, which works for me fixing the filled contours like this
contour_generator
mpl.contour.ContourSet
There is some fiddling with colormap, norm etc. and I do not get exactly matching colorbar compared to xarray/matplotlib but I can live with it. I did not experience any problems with respect to domain boundary/interior. At least I can't see any.
If I knew where to start with the subclass, I'd try to get my above filter
working along your suggestion.
Many thanks for taking the time!
It looks like your approach works fine in your output plot. I think you are lucky that the domain boundary and masked regions are not appreciably affecting your output. In the general case your filled contour solution won't always work. I suppose this is the difference between writing a filtering process for your own use, when you only have to please yourself, and writing one for a library when you have to consider how if affects every user and every use case.
Here is the subclass solution. This is the contents of a file I call my_contour_generator.py
:
import contourpy
class MyContourGenerator(contourpy.SerialContourGenerator):
def create_contour(self, level):
return self.lines(level)
def create_filled_contour(self, lower_level, upper_level):
return self.filled(lower_level, upper_level)
def filled(self, lower_level, upper_level):
filled = super().filled(lower_level, upper_level)
print(f"MyContourGenerator filled, levels={lower_level} {upper_level}")
return filled
def lines(self, level):
lines = super().lines(level)
print(f"MyContourGenerator lines, level={level}")
return lines
# Accessing private dictionary in contourpy isn't recommended, but will
# suffice until there is a "register new contour generator" function.
contourpy._class_lookup["serial"] = MyContourGenerator
There are print
statements where you would call your code to perform the filtering of the returned contour lines and filled contours. The lines()
and filled()
functions are the fundamental ones from contourpy
's point of view but matplotlib
uses create_contour()
and create_filled_contour()
for historical reasons, so these are just synonyms for lines()
and filled()
.
Here is some test code to show this working, it has to be in the same directory as my_contour_generator.py
:
import matplotlib.pyplot as plt
import my_contour_generator
fig, ax = plt.subplots()
ax.contour([[1, 2], [3, 4]], algorithm="serial")
plt.show()
So by registering your contour generator with contourpy
under the name "serial"
, you can use matplotlib
as usual as long as your specify the keyword argument algorithm="serial"
and you get all the benefit of matplotlib
's usual level-determining and colormapping, etc.
There will eventually be some correct way to register a new contour generator, and you won't have to override an existing algorithm name. This is needed at the moment as matplotlib
checks that the name is valid, but this should be changed to defer that check to contourpy
.
For completeness I should say that you could subclass Mpl2014ContourGenerator
instead and register it under the name "mpl2014"
. Then you would be using the 2014 algorithm and wouldn't have to specify the algorithm
keyword argument in your contour()
and contourf()
calls. But I always recommend using "serial"
rather than "mpl2014"
as it is usually faster and uses less RAM.
Thank you very much again, Ian. That was very helpful. I've taken the chance to refine the filtering to use polygon-area as threshold (instead of vertex-count). That way issues with domain-boundary are minimized as well as issues with open lines vs. closed polygons.
To parameterize the area-threshold I utilize an environment variable. That way the code just sticks to the current matplotlib-algorithm (mpl2015
) if the area-treshold is zero or not set at all.
The very nice and comfortable thing is, that I can deploy this everywhere and keep backwards compatibility.
Here is my current code for reference:
def polyarea(p):
"""Calculate area of polygon using shoelace formula.
https://en.wikipedia.org/wiki/Shoelace_formula
.. math::
A = \frac{1}{2} \sum_{i=1}^{n} x_i(y_{i+1} - y_{i-1})
"""
area = 0.0
x = p[:, 0]
y = p[:, 1]
for i in range(len(x)):
area += x[i-1] * (y[i] - y[i-2])
return abs(area) / 2.0
class FilteredContourGenerator(contourpy.Mpl2014ContourGenerator):
def create_contour(self, level):
return self.lines(level)
def create_filled_contour(self, lower_level, upper_level):
return self.filled(lower_level, upper_level)
def filled(self, lower_level, upper_level):
min_area = float(os.environ.get("MPL_MIN_CONTOUR_AREA", "0"))
# get vertices/kinds
vertices, kinds = super().filled(lower_level, upper_level)
# return early if nothing to do
if min_area == 0.:
return vertices, kinds
ncseg = []
nckind = []
# iterate over vertices/kinds
for k, (seg, kind) in enumerate(zip(vertices, kinds)):
nseg = []
nkind = []
# iterate over single contours
start = np.where(kind == 1)[0]
stop = np.where(kind == 79)[0]
for s0, s1 in zip(start, stop):
# calculate area and extend
if polyarea(seg[s0: s1]) >= min_area:
nseg.extend(seg[s0: s1])
nkind.extend(kind[s0: s1])
# combine again
if nseg:
ncseg.append(np.vstack(nseg))
nckind.append(nkind)
return ncseg, nckind
def lines(self, level):
min_area = float(os.environ.get("MPL_MIN_CONTOUR_AREA", "0"))
# get vertices/kinds
vertices, kinds = super().lines(level)
# return early if nothing to do
if min_area == 0.:
return vertices, kinds
ncseg = []
nckind = []
# iterate over vertices/kinds
for k, (seg, kind) in enumerate(zip(vertices, kinds)):
# calculate area and append
if polyarea(seg) >= min_area:
ncseg.append(seg)
nckind.append(kind)
return ncseg, nckind
Example showcase:
import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
# overwrite contour algorithm
contourpy._class_lookup["mpl2014"] = FilteredContourGenerator
delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = np.exp(-X**2 - Y**2)
Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2)
Z = (Z1 - Z2) * 2
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
min_area = 0
# set area-threshold to zero
os.environ["MPL_MIN_CONTOUR_AREA"] = f"{min_area}"
CSF = ax[0].contourf(X, Y, Z)
CS = ax[0].contour(X, Y, Z, colors="k")
ax[0].clabel(CS, inline=True, fontsize=10)
ax[0].set_title(f'Simplest default with labels - min_area {min_area}')
min_area = 0.5
# set area-threshold to 0.5
os.environ["MPL_MIN_CONTOUR_AREA"] = f"{min_area}"
CSF = ax[1].contourf(X, Y, Z)
CS = ax[1].contour(X, Y, Z, colors="k")
ax[1].clabel(CS, inline=True, fontsize=10)
ax[1].set_title(f'Simplest default with labels - min_area {min_area}')
@ianthomas23 Thanks for this neat package.
I have case where I want to use my current workflow using xarray/matplotlib based plotting and get many speckles (short contours lines/filles) which I would like to remove.
I've taken two ways now, both with unsatisfactory results:
Both work more or less well for contour lines, but for filled contours 1. isn't working nicely since you would have to fill small holes/fixup those polygons and for 2. I'd have to handle the whole cmap/level stuff myself.
I know I can smooth/downsample my data prior contouring to make it look better (and I did that to some extent) but still would like to remove some of the short lines.
Is there any chance to parameterize the minimal contour line length within contourpy? This could be transparently moved as kwarg through the stack, so that all downstream users could benefit.
Anyway, if you have more suggestion on how to tackle the problem, I'd appreciate any comments.