boschresearch / pylife

a general library for fatigue and reliability
https://pylife.readthedocs.io
Apache License 2.0
137 stars 25 forks source link

Rainflow binning gives unexpected wrong behaviour #9

Closed jacco-oosterhuis closed 2 years ago

jacco-oosterhuis commented 2 years ago

Given slightly noisy data, the rainflow counting algorithm gives incorrect results, due to the order of the algorithm (counting before binning) and the way binning is performed

import pylife
import pylife.stress.rainflow as RF
import pandas as pd
import numpy as np

df = pd.DataFrame({"time": np.array([ 1. ,  2. ,  3. ,  4. ,  4.1,  4.8,  5. ,  6. ,  7. ,  8. ,  9.1,
                                    10.1, 11. , 12. , 13. , 14. ]),
                    "value": np.array([2.2, 7.3, 4.2, 8.1, 8.4, 8.2, 2.1, 4.9, 4.2, 6. , 1. , 6.9, 4. ,
                           5. , 2. , 5. ])})
# this should have four cycles: 5-4, 7-4, 4-5, 2-6
rfc = RF.FourPointDetector(recorder=RF.LoopValueRecorder())
bins = np.linspace(1,8,8)
n_bins = len(bins)
rfc.process(df["value"])
res = rfc.recorder.matrix(bins = bins)
result_df = pd.DataFrame(res[0],columns=res[2][:n_bins-1])
result_df["from"] = res[1][:n_bins-1]
result_df = result_df.melt(var_name="to", id_vars = "from")
result_df[result_df["value"] >0]

**Expected result**
""" 
output:
from    to  value
24  5.0 4.0 1.0
27  7.0 4.0 1.0
31  4.0 5.0 1.0
36  2.0 6.0 1.0""
First, a cycle from 4 to 4 doesn't make much sense in my opinion. For this reason binning should be _before_ counting, not after.
Second, 4.9 (in "value") gets binned to 4 because of the way binning is done internally, which is by all means undesired.

**Observed result**
""" 
output:
from    to  value
24  4.0 4.0 1.0
27  7.0 4.0 1.0
31  4.0 5.0 1.0
36  2.0 6.0 1.0""

Environment (please complete the following information):

Additional context Add any other context about the problem here.

johannes-mueller commented 2 years ago

Well, that is the way the bins are situated. If you check the unbinned collective

In [7]: rfc.recorder.collective
Out[7]: 
   from   to
0   7.3  4.2
1   4.9  4.2
2   2.1  6.0
3   4.0  5.0

there is loop with index 1 from 4.9 to 4.2. If you have only one bin class between 4 and 5 both, from and to will be in it. It just means that both, from and to are "somewhere between 4.0 and 5.0".

That is why LoopValueRecorder.histogram()returns an IntervalIndexed 2D histogram like

In [8]: hist = rfc.recorder.histogram(bins=np.linspace(1,8,8))

In [9]: hist[hist>0.0]
Out[9]: 
from        to        
(2.0, 3.0]  (6.0, 7.0]    1.0
(4.0, 5.0]  (4.0, 5.0]    1.0
            (5.0, 6.0]    1.0
(7.0, 8.0]  (4.0, 5.0]    1.0
dtype: float64

That clearly shows that there is one loop from interval (4.0, 5.0] to interval (4.0, 5.0].

P. S. LoopValueRecorder.matrix() is obsolete since 2.0.0 BTW

jacco-oosterhuis commented 2 years ago

Ok, but isn't the purpose of binning discretization? This is how I interpret this. This also means that if two turning points, after binning, belong to the same bin, only one of them is actually a turning point. This affects the algorithm and outcome.

Even if it isn't, I think binning like

centers = (np_bins[1:] + np_bins[:-1]) / 2
res_df = np.digitize(df, bins=centers)

makes more sense, as it rounds to the bin centers. (I used matrix() because this is easier for us in postprocessing =) ).

johannes-mueller commented 2 years ago

The purpose of the binning is to classify the collected hysteresis loops into a 2D-histogram, hence the new names of the methods .histogram() and histogram_numpy() (the former .matrix()).

What you might want to do is to digitize the time signal before the rainflow analysis. You can do that easily by

bins = np.linspace(1.0, 8.0, 8)
digitized_signal = bins[np.digitize(timesignal, bins)]
rfc.process(digitized_signal)

As you said this will drop all the hysteresis loops that happen do fall into the same bin.

Edit: corrected the np.digitize() call.

johannes-mueller commented 2 years ago

Closing this. Please reopen if there are more things to discuss.