contourpy / contourpy

Python library for calculating contours in 2D quadrilateral grids
https://contourpy.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
66 stars 19 forks source link

Howto remove short contour lines/filles #212

Open kmuehlbauer opened 1 year ago

kmuehlbauer commented 1 year ago

@ianthomas23 Thanks for this neat package.

I have case where I want to use my current workflow using xarray/matplotlib based plotting and get many speckles (short contours lines/filles) which I would like to remove.

I've taken two ways now, both with unsatisfactory results:

  1. grab the resulting contourset and iterate over path collections and remove the short lines/filles
  2. use contourpy to create the lines/filles and create the matplotlib Contourset/Collections from scratch

Both work more or less well for contour lines, but for filled contours 1. isn't working nicely since you would have to fill small holes/fixup those polygons and for 2. I'd have to handle the whole cmap/level stuff myself.

I know I can smooth/downsample my data prior contouring to make it look better (and I did that to some extent) but still would like to remove some of the short lines.

Is there any chance to parameterize the minimal contour line length within contourpy? This could be transparently moved as kwarg through the stack, so that all downstream users could benefit.

Anyway, if you have more suggestion on how to tackle the problem, I'd appreciate any comments.

ianthomas23 commented 1 year ago

I don't really like the idea in principle as I think it is the responsibility of the library to contour the data as given, not provide a filtered/sanitised version of the contours.

There is a practical consideration too. As you have established, it is possible to set a minimum line length for contour lines and for filled polygons that do not intersect the domain boundary (or masked quads) as you can simply follow those contours as normal and then just not report them to the user. But for filled polygons that do intersect the boundary you cannot do this. You have to apply your minimum line length to each of the potentially multiple line segments that cross the interior of the domain otherwise they won't sensibly meet their neighbouring contoured polygons. You cannot discard a whole polygon because one of the line segments is too short, you need to backtrack to the last intersection with the boundary and follow that boundary instead. This breaks a fundamental assumption of all current contourpy algorithms that each grid point can be categorised as above or below the contour levels before tracing contours. With the minimum line length idea this information needs to be overridden depending on context. This has quite an impact on the implementation and would cause a significant reduction in performance for all users of the library, so I would not modify the existing algorithms in this way.

It would be possible to write a new algorithm to do this, perhaps with significant reuse of the existing serial algorithm, which could reside in contourpy or outside. But I am not personally interested in writing it.

Having disappointed you, how can I help?

If you only wanted to filter out short lines and leave the filled polygons as they are, i.e. just solve the easy part of the problem, then one could write a really simple class that inherits from SerialContourGenerator and overrides the lines and create_contour methods to call those of the base class, but then you can filter out what is returned and pass that on to the caller. This new class could be registered with contourpy as a valid contour generator so that you could specify it by a string name. Then, with some minor changes to matplotlib you'd be able to call your contour generator in the usual way. I haven't fully worked out what the API should be for this registering of a new contour generator but I am working on it; I have something that works today but it is experimental and is very likely to change.

If you wanted to modify your filled contours via a similar approach then you would need to know which of the points in each filled contour polygon are on a boundary and which are in the domain interior. This would need one or more new FillType possibilities to return extra arrays of booleans. This would give you enough information to work out which lines to remove, but you would have the problem of replacing them with points following the domain boundary, so this is much harder than the contour lines case.

kmuehlbauer commented 1 year ago

Thanks Ian for the detailed explanation and reasoning.

Having disappointed you, how can I help?

No, I'm not disappointed, quite the contrary. I'm happy to follow your suggestion with the SerialContourGenerator. Would it be possible to keep that on the Python side of things?

I also have a solution, which works for me fixing the filled contours like this

There is some fiddling with colormap, norm etc. and I do not get exactly matching colorbar compared to xarray/matplotlib but I can live with it. I did not experience any problems with respect to domain boundary/interior. At least I can't see any.

contours

If I knew where to start with the subclass, I'd try to get my above filter working along your suggestion.

Many thanks for taking the time!

ianthomas23 commented 1 year ago

It looks like your approach works fine in your output plot. I think you are lucky that the domain boundary and masked regions are not appreciably affecting your output. In the general case your filled contour solution won't always work. I suppose this is the difference between writing a filtering process for your own use, when you only have to please yourself, and writing one for a library when you have to consider how if affects every user and every use case.

Here is the subclass solution. This is the contents of a file I call my_contour_generator.py:

import contourpy

class MyContourGenerator(contourpy.SerialContourGenerator):
    def create_contour(self, level):
        return self.lines(level)

    def create_filled_contour(self, lower_level, upper_level):
        return self.filled(lower_level, upper_level)

    def filled(self, lower_level, upper_level):
        filled = super().filled(lower_level, upper_level)
        print(f"MyContourGenerator filled, levels={lower_level} {upper_level}")
        return filled

    def lines(self, level):
        lines = super().lines(level)
        print(f"MyContourGenerator lines, level={level}")
        return lines

# Accessing private dictionary in contourpy isn't recommended, but will
# suffice until there is a "register new contour generator" function.
contourpy._class_lookup["serial"] = MyContourGenerator

There are print statements where you would call your code to perform the filtering of the returned contour lines and filled contours. The lines() and filled() functions are the fundamental ones from contourpy's point of view but matplotlib uses create_contour() and create_filled_contour() for historical reasons, so these are just synonyms for lines() and filled().

Here is some test code to show this working, it has to be in the same directory as my_contour_generator.py:

import matplotlib.pyplot as plt
import my_contour_generator

fig, ax = plt.subplots()
ax.contour([[1, 2], [3, 4]], algorithm="serial")
plt.show()

So by registering your contour generator with contourpy under the name "serial", you can use matplotlib as usual as long as your specify the keyword argument algorithm="serial" and you get all the benefit of matplotlib's usual level-determining and colormapping, etc.

There will eventually be some correct way to register a new contour generator, and you won't have to override an existing algorithm name. This is needed at the moment as matplotlib checks that the name is valid, but this should be changed to defer that check to contourpy.

For completeness I should say that you could subclass Mpl2014ContourGenerator instead and register it under the name "mpl2014". Then you would be using the 2014 algorithm and wouldn't have to specify the algorithm keyword argument in your contour() and contourf() calls. But I always recommend using "serial" rather than "mpl2014" as it is usually faster and uses less RAM.

kmuehlbauer commented 1 year ago

Thank you very much again, Ian. That was very helpful. I've taken the chance to refine the filtering to use polygon-area as threshold (instead of vertex-count). That way issues with domain-boundary are minimized as well as issues with open lines vs. closed polygons.

To parameterize the area-threshold I utilize an environment variable. That way the code just sticks to the current matplotlib-algorithm (mpl2015) if the area-treshold is zero or not set at all.

The very nice and comfortable thing is, that I can deploy this everywhere and keep backwards compatibility.

Here is my current code for reference:

def polyarea(p):
    """Calculate area of polygon using shoelace formula.

    https://en.wikipedia.org/wiki/Shoelace_formula

    .. math::

        A = \frac{1}{2} \sum_{i=1}^{n} x_i(y_{i+1} - y_{i-1})
    """
    area = 0.0
    x = p[:, 0]
    y = p[:, 1]
    for i in range(len(x)):
        area += x[i-1] * (y[i] - y[i-2])
    return abs(area) / 2.0

class FilteredContourGenerator(contourpy.Mpl2014ContourGenerator):
    def create_contour(self, level):
        return self.lines(level)

    def create_filled_contour(self, lower_level, upper_level):
        return self.filled(lower_level, upper_level)

    def filled(self, lower_level, upper_level):
        min_area = float(os.environ.get("MPL_MIN_CONTOUR_AREA", "0"))
        # get vertices/kinds
        vertices, kinds = super().filled(lower_level, upper_level)
        # return early if nothing to do
        if min_area == 0.:
            return vertices, kinds
        ncseg = []
        nckind = []
        # iterate over vertices/kinds
        for k, (seg, kind) in enumerate(zip(vertices, kinds)):
            nseg = []
            nkind = []
            # iterate over single contours
            start = np.where(kind == 1)[0]
            stop = np.where(kind == 79)[0]
            for s0, s1 in zip(start, stop):
                # calculate area and extend
                if polyarea(seg[s0: s1]) >= min_area:
                    nseg.extend(seg[s0: s1])
                    nkind.extend(kind[s0: s1])
            # combine again
            if nseg:            
                ncseg.append(np.vstack(nseg))
                nckind.append(nkind)
        return ncseg, nckind

    def lines(self, level):
        min_area = float(os.environ.get("MPL_MIN_CONTOUR_AREA", "0"))
        # get vertices/kinds
        vertices, kinds = super().lines(level)
        # return early if nothing to do
        if min_area == 0.:
            return vertices, kinds

        ncseg = []
        nckind = []
        # iterate over vertices/kinds
        for k, (seg, kind) in enumerate(zip(vertices, kinds)):
            # calculate area and append
            if polyarea(seg) >= min_area:
                ncseg.append(seg)
                nckind.append(kind)
        return ncseg, nckind

Example showcase:

import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt

# overwrite contour algorithm
contourpy._class_lookup["mpl2014"] = FilteredContourGenerator

delta = 0.025
x = np.arange(-3.0, 3.0, delta)
y = np.arange(-2.0, 2.0, delta)
X, Y = np.meshgrid(x, y)
Z1 = np.exp(-X**2 - Y**2)
Z2 = np.exp(-(X - 1)**2 - (Y - 1)**2)
Z = (Z1 - Z2) * 2

fig, ax = plt.subplots(1, 2, figsize=(10, 5))
min_area = 0
# set area-threshold to zero
os.environ["MPL_MIN_CONTOUR_AREA"] = f"{min_area}"
CSF = ax[0].contourf(X, Y, Z)
CS = ax[0].contour(X, Y, Z, colors="k")
ax[0].clabel(CS, inline=True, fontsize=10)
ax[0].set_title(f'Simplest default with labels - min_area {min_area}')

min_area = 0.5
# set area-threshold to 0.5
os.environ["MPL_MIN_CONTOUR_AREA"] = f"{min_area}"
CSF = ax[1].contourf(X, Y, Z)
CS = ax[1].contour(X, Y, Z, colors="k")
ax[1].clabel(CS, inline=True, fontsize=10)
ax[1].set_title(f'Simplest default with labels - min_area {min_area}')

contours_example