arvkevi / kneed

Knee point detection in Python :chart_with_upwards_trend:
https://kneed.readthedocs.io
BSD 3-Clause "New" or "Revised" License
734 stars 74 forks source link

y_transform doesn't work with unevenly spaced x values #98

Open MitchellAcoustics opened 3 years ago

MitchellAcoustics commented 3 years ago

https://github.com/arvkevi/kneed/blob/174225bd407ebc661a6d583c68b2a40aa4070a84/kneed/knee_locator.py#L221

The transformation done to turn the curve into concave/increasing from a concave/decreasing curve doesn't work when you have unevenly spaced x values. When the np.flip(y) function is called it will match up the y values with the x values according to index, but not according to the actual original values. This means, if you have x values which get increasingly close together in the original curve, the transformed curve will actually become a convex/increasing curve.

Unfortunately I can't include a reproducible example right now, but hopefully this will demonstrate what I mean.

output

As you can see, in the original curve, the x values start quite far apart, then get increasingly close together. The y_transform flips the y values, but does not preserve their proper spacing relative to the curve so the shape doesn't change as expected. We can also see that the original curve would have a knee at around 1.5, but the transformed curve would have an elbow at around 3.85.

I don't really know of a way to fix this, but I find it a bit strange to transform the curve this way. I haven't dug through all of the source code, but if I understand the original paper correctly, the difference curve is calculated according to the difference from a straight line. Why not just change what straight line is being compared against for the difference calculation for each of the four convex/concave and increasing/decreasing combinations?

In the case of this example (i.e. concave/decreasing), the normalized straight line would be y = -x + 1 and it looks like that would result in a reasonable difference curve:

# Calculate the difference curve
y_line = -x_normalized + 1

y_difference = y_normalized - y_line
x_difference = x_normalized.copy()

plt.title("Normalized spline & difference curve")
plt.plot(x_normalized, y_normalized, '.', label='original curve');
plt.plot(x_normalized, y_line, '--', label='straight line comparison');
plt.plot(x_difference, y_difference, label='difference curve');
plt.legend()

output2

With that difference curve, it then appears that the knee could be correctly identified at around 0.5 (normalized x values). Then, for an increasing curve, the line to compare against is y = x, and for the convex curves, we take the absolute value of the difference.

arvkevi commented 3 years ago

Hi @MitchellAcoustics, thanks for checking out the repo. The x and y values are interpolated and normalized before they are transformed, so it should handle the issue of unevenly spaced x value input. Does this example help answer any questions?

import numpy as np
import matplotlib.pyplot as plt

from kneed import KneeLocator

n = 50
x = np.geomspace(1, 256, num=n)
y = np.linspace(0, 1, num=n)

plt.scatter(x, y)

scatter_kneed

Kneed seems to find a reasonable knee point.

kneed = KneeLocator(x, y, curve='concave', direction='increasing')

print(kneed.knee)
46.883891195871875
kneed.plot_knee()

uneven_knee

yongnuan commented 2 years ago

Hi Thanks for this great algorithm. However I am struggling to get it work when x becomes log(x). For example for your same above example, if I did below, the found knee point is at end point which is not correct.

import numpy as np import matplotlib.pyplot as plt

from kneed import KneeLocator

n = 50 x = np.linspace(1, 256, num=n) x1=np.log(x) y = np.linspace(0, 1, num=n)

kneed = KneeLocator(x1, y, curve='convex', direction='increasing')

knee

yongnuan commented 2 years ago

Hi

I keep playing with interpolation method, polynomial degree and sensitivity parameters and I found the results change a lot by changing these parameters. and the best result I got is still not optimal. Now my question is how do we select the parameters automatically? I am using this algorithm to detect elbow point automatically for any curve. Could you please help ? Thank you so much.

import numpy as np import matplotlib.pyplot as plt

from kneed import KneeLocator

n = 50

x = np.linspace(1, 256, num=n) x1=np.log(x) y = np.linspace(0, 1, num=n) kneed = KneeLocator(x1, y, S=1,curve='convex', direction='increasing',online=True,interp_method="polynomial",polynomial_degree=3)

print(kneed.knee) kneed.plot_knee_normalized() kneed.plot_knee() elbow

vhu43 commented 1 year ago

If I may chime in, I think the issue is that only the y value is transformed. In knee_locator.py, there is a transform_y function that handles the entire transform. I think this could be fixed by adding a transform_x function that handles an x transformation.

    def transform_x(x: Iterable[float], direction: str, curve: str) -> float:
        """transform x to concave, increasing based on given direction and curve"""
        # convert elbows to knees
        if direction == "decreasing" and curve == "concave":
            x = x.max() - x
        elif direction == "increasing" and curve == "convex":
            x = x.max() - x

       return x
    def transform_y(y: Iterable[float], direction: str, curve: str) -> float:
        """transform y to concave, increasing based on given direction and curve"""
        # convert elbows to knees
        if curve == "convex":
            y = y.max() - y

       return y

Then changing this part of the code

 # Step 2: normalize values
        self.x_normalized = self.__normalize(self.x)
        self.y_normalized = self.__normalize(self.Ds_y)

        # Step 3: Calculate the Difference curve
        self.y_normalized = self.transform_y(
            self.y_normalized, self.direction, self.curve
        )

        self.x_normalized = self.transform_x(
            self.x_normalized, self.direction, self.curve
        )
        # normalized difference curve