cjekel / piecewise_linear_fit_py

fit piecewise linear data for a specified number of line segments
MIT License
289 stars 59 forks source link

Feature request: remove output messages, restricted result range #3

Closed vkhodygo closed 6 years ago

vkhodygo commented 6 years ago

Hi again!

Sorry for disturbing, but I would like to ask you a few more things. First, is it possible to add a key parameter which turns off all the output messages (except errors)? I have to deal with massive numbers of files and I need to see my own print messages.

Second, can your algorithm in general use boundaries for resulting parameters? Say, I know in general, that the values of slopes a priori lie in range [0:2], thus, the result has to be in the range.

Sincerely, V.

cjekel commented 6 years ago

First, is it possible to add a key parameter which turns off all the output messages (except errors)? I have to deal with massive numbers of files and I need to see my own print messages.

you can use something like

fit(4,disp=False)

to fit to four line segments while turning off the optimization output...

What is sometimes printed often are numpy warnings, i.e. divide by zero. You can look at https://stackoverflow.com/questions/14463277/how-to-disable-python-warnings to disable warnings in Python.

Does this help?

The only print() in the code is

print(res)

which displays the optimization results?

Would you like a keyword to turn this off?

Second, can your algorithm in general use boundaries for resulting parameters? Say, I know in general, that the values of slopes a priori lie in range [0:2], thus, the result has to be in the range.

The fit doesn't solve for the slopes, it actually solves for the y locations provided x break points. From this solution the slopes can be calculated.

I'm working on implementing fixing the x,y locations at the boundaries (beginning and end). There has been a few people who have requested this, but it's not ready (or working) at the moment.

vkhodygo commented 6 years ago

you can use something like

fit(4,disp=False)

No, I still get messages like

fun: 4.895025824377757e-07 message: 'Optimization terminated successfully.' nfev: 393 nit: 12 success: True x: array([2.6693644 , 3.32241918]) And when I try to use fitfast, it becomes even worse (especially with 4 cores running)

What is sometimes printed often are numpy warnings, i.e. divide by zero.

I observe some messages, indeed, they look like this:

RuntimeWarning: invalid value encountered in double_scalars
  A[i,i] = A[i,i] + sum(((sepDataX[i] - breaks[i+1]) ** 2)) / ((breaks[i+1] - breaks[i]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:248: RuntimeWarning: invalid value encountered in double_scalars
  A[i,i+1] = A[i,i+1] - sum((sepDataX[i] - breaks[i]) * (sepDataX[i] - breaks[i+1])) / ((breaks[i+1] - breaks[i]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:249: RuntimeWarning: invalid value encountered in double_scalars
  B[i] = B[i] + (-sum(sepDataX[i] * sepDataY[i]) + breaks[i+1] * sum(sepDataY[i])) / (breaks[i+1] - breaks[i])
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:241: RuntimeWarning: invalid value encountered in double_scalars
  A[i,i-1] = A[i,i-1] - sum((sepDataX[i-1] - breaks[i-1]) * (sepDataX[i-1] - breaks[i])) / ((breaks[i] - breaks[i-1]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:242: RuntimeWarning: invalid value encountered in double_scalars
  A[i,i] = A[i,i] + sum((sepDataX[i-1] - breaks[i-1]) ** 2) / ((breaks[i] - breaks[i-1]) ** 2)
/home/user/.local/lib/python3.6/site-packages/pwlf/pwlf.py:243: RuntimeWarning: invalid value encountered in double_scalars
  B[i] = B[i] + (sum(sepDataX[i-1] * sepDataY[i-1]) - breaks[i-1] * sum(sepDataY[i-1])) / (breaks[i] - breaks[i-1])

and

/home/user/.local/lib/python3.6/site-packages/numpy/core/_methods.py:112: RuntimeWarning: invalid value encountered in subtract
  x = asanyarray(arr - arrmean)

Is that what you mean?

Would you like a keyword to turn this off?

Yes, that would be great. Since I can't be sure that results in such cases are correct, I need to know what datasets lead to this (and I have a few thousands =/).

I'm working on implementing fixing the x,y locations at the boundaries (beginning and end). There has been a few people who have requested this, but it's not ready (or working) at the moment.

Well, that means that I have to do more things manually.

P.S. It seems that your fitfast doesn't work as planned, however, I need to check, what data breaks it, and open a new issue.

cjekel commented 6 years ago

Thanks for the PR!

  1. https://github.com/cjekel/piecewise_linear_fit_py/commit/66c02453f4423656b71232d36b0f48fe9f59f8bd now defaults to prints being off. Use

    piecewise_lin_fit(x, y, disp_res=True)

    to turn prints on. This doesn't get rid of numpy warnings.

  2. https://github.com/cjekel/piecewise_linear_fit_py/commit/9959ec25ae71b824039d1ef5ff23d6c9ba5a44e7 fitfast() now defaults to a population of 2. This should be faster than the differential evolution for all cases, at the cost of possibly not finding a good solution. Increase the population of fitfast() to find a better solution.

  3. Can you describe you application for boundary slopes? Are you trying to force a solution range, or speed up the optimization implementation?

vkhodygo commented 6 years ago

Good, now I can see only important messages! I haven't tried to use updated fitfast yet, hope it works properly now. I know, that slopes can't be, say, negative and greater than 2. Thus, all values that are not in range can't be accepted. I know, that, for example, scipy allows to use boundaries for ranges, but your package is better for my purposes.

P.S. Your updated version definitely looks faster, however, now I get very strange results. This drives me crazy %) msd_plot 6 _msd_plot 0 1 3_0 22 Same dataset (msd shifted), but correct (at least acceptable) results only in the case of old algo. I use default code from your examples and 3 segments to fit the data.

cjekel commented 6 years ago

Edit*: I've found a working example that breaks... will be working on a hotfix. Sorry about this.

Can you send me that msd shifted dataset to troubleshoot? Or reproduce on a simple data set? Are you using version 0.2.3?

In the meantime you can revert to the old release by running

[sudo] pip uninstall pwlf
[sudo] pip install pwlf==0.1.7 
cjekel commented 6 years ago

Fixed the weird prediction issues in 0.2.4. Sorry about that, not sure how that escaped my test function! I need to think about that more...

I know, that slopes can't be, say, negative and greater than 2. Thus, all values that are not in range can't be accepted. I know, that, for example, scipy allows to use boundaries for ranges, but your package is better for my purposes.

Okay. I need to think about this more, but I think it can be done by setting up inequality constraints.

They would look something like b_l <= b_1 <= b_h Then 1 <= b_1/b_l and 1 >= b_1/b_h

this might be useful for future https://math.stackexchange.com/questions/69613/linear-least-squares-with-inequality-constraints

vkhodygo commented 6 years ago

Okay. I need to think about this more, but I think it can be done by setting up inequality constraints.

Thank you. The attached file is one of those with strange values. The first column contains time, the last one is the data I need. I import it, skip the first row with zero values and use numpy.log10() to linearize it.

data = pd.read_csv(file, sep='\s+', engine='python', usecols=(0, 3), skiprows=1, names=('lag', 'msd_shift')) lag = np.log10(data['lag'].as_matrix()) msd_shift = np.log10(data['msd_shift'].as_matrix())

data.zip

cjekel commented 6 years ago

Works good in version 0.2.4! temp

import numpy as np
import matplotlib.pyplot as plt
import pwlf
import pandas as pd
data = pd.read_csv('ref_msd.16.bin', sep='\s+', engine='python', usecols=(0, 3), skiprows=1, names=('lag', 'msd_shift'))
lag = np.log10(data['lag'].as_matrix())
msd_shift = np.log10(data['msd_shift'].as_matrix())
myPWLF = pwlf.PiecewiseLinFit(lag, msd_shift)
myPWLF.fit(3)
yhat = myPWLF.predict(lag)
plt.figure()
plt.plot(lag, msd_shift, 'o')
plt.plot(lag, yhat)
plt.show()
vkhodygo commented 6 years ago

Great! However, could you please take a look at the values of slopes: they are a little bit strange: myPWLF.slopes array([ 1.89755427, -0.67540119, 2.00421613])

Upd. I'm pretty sure, that the plot itself is correct. I have found the old one based on the initial algorithm. It seems that the first slope is identical in both cases, however, it's not clear what gives such behaviour. msd_plot 16

Upd2. I feel that this is related to this question.

cjekel commented 6 years ago

I ended up creating a new function to evaluate the slopes by predicting at the break points. The results should be similar to the previous verision. See https://github.com/cjekel/piecewise_linear_fit_py/commit/d59e0be0dda6e9a478650783dfff6e90b30da6c3 for the function.

I will push 0.2.5 to pypi shortly .

Edit: My previous interpretation of the beta parameters as slopes was incorrect. New 0.2.5 release should give you similar slope values as you had before.

vkhodygo commented 6 years ago

I've updated it but no result so far. I still get the same values: >>> myPWLF.slopes array([ 1.89755342, -0.67540057, 2.00421982]) >>> myPWLF.beta array([-4.09022781, 1.89755342, -0.67540057, 2.00421982])

Edit. Sorry, everything works fine, I'm simply used to python3 (and pip3) and your instructions for the upgrade are for p2. Edit #2: I think, that it works like it has to. You can close this issue.