CPCLAB-UNIPI / SIPPY

Systems Identification Package for PYthon
GNU Lesser General Public License v3.0
269 stars 92 forks source link

Regarding prediction and identified coefficient of input/output models. #52

Closed SelmaCho closed 7 months ago

SelmaCho commented 8 months ago

Dear SIPPY staff,

I am a beginner of SIPPY to use it for developing MPC framework.

I am studying system identification and SIPPY based on the book published by A. Kumar and Flores-Cerrillo, "Machine learning in python for dynamic process systems".

While I read this book and try some example codes, I wonders how to predict new output of ARX and extract coefficients with difference equation form.

In the manual, it is written that using fset.validation is proper to predict new output; However, its description is ambiguous. I would appreciate you all if you let me know how to predict the new output or extract coefficients with difference equation format.

Here is my toy problem code:

from sippy import system_identification from sippy import functionset as fset import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.metrics import r2_score # model diagnostics from statsmodels.graphics.tsaplots import plot_acf # model diagnostics

temp = pd.read_csv('simpleProcess.csv') data = np.array(temp) u = data[:,0, None]; y = data[:,1, None]

u_scaler = StandardScaler(with_std=False) y_scaler = StandardScaler(with_std=False) u_centered = u_scaler.fit_transform(u) y_centered = y_scaler.fit_transform(y)

Id_ARX = system_identification(y_centered,u_centered, 'ARX', ARX_orders=[3,3,2]) # y(k)-y(k-1)-y(k-2)-y(k-3)=u(k-1)+u(k-2)+u(k-3) y_centered_pred = np.transpose(Id_ARX.Yid) y_centered_pred2 = fset.validation(Id_ARX, y_centered, u_centered, np.linspace(1,999,999), 1)

plt.figure() plt.plot(y_centered, 'o', label='raw data') plt.plot(y_centered_pred, '--', label='ARX fit') plt.legend(); plt.xlabel('k'); plt.ylabel('y') print('Fit accuracy=', r2_score(y_centered,y_centered_pred))

res = y_centered-y_centered_pred plot_acf(res,lags=50) plt.xlabel('lag')

ValueError Traceback (most recent call last) Cell In[112], line 21 19 Id_ARX = system_identification(y_centered,u_centered, 'ARX', ARX_orders=[3,3,2]) # y(k)-y(k-1)-y(k-2)-y(k-3)=u(k-1)+u(k-2)+u(k-3) 20 y_centered_pred = np.transpose(Id_ARX.Yid) ---> 21 y_centered_pred2 = fset.validation(Id_ARX, y_centered, u_centered, np.linspace(1,999,999), 1) 23 plt.figure() 24 plt.plot(y_centered, 'o', label='raw data')

ValueError: A value (0.0) in x_new is below the interpolation range's minimum value (1.0).

RBdC commented 8 months ago

Dear SelmaCho, to test your code and check the error, you need to share your data file ('simpleProcess.csv').. send it to us by mail, if the case.

Please consider that you find the output of your model - obtained with the LLS approach - within the output attribute as Id_ARX.Yid; the ARX coefficients in the difference equation form can be extracted from the variable Id_ARX.G (which is a transfer function relating input u and output y) with a simple arrangement, recalling that yk-t = z^-t*y, where z is the shift operator.

The syntax for fset.validation - usually to be used for model validation on new data, not the same dataset used for model identification - appears wrong, as input/output position needs to be swapped, as: fset.validation(Id_ARX, u_centered, y_centered, np.linspace(1,999,999), 1)

Best RBdC

SelmaCho commented 8 months ago

simpleProcess.csv

Thank you for your kind reply, RBDC.

I've attached it by github link because I cannot find your email address ^^;; I have more questions. 1) Regarding yk-t = z^(-t*y) the Id_ARX.G is following. Sorry for my lack knowledge, I couldn't get an idea to extract the coefficient from this transfer function. Can you give me a hint?

(0.756 z^2 + 0.08442 z + 0.1733) / (z^5 - 0.2474 z^4 - 0.1489 z^3 - 0.1318 z^2)

dt = 1.0

2) The syntax for fset.validation As you kindly explained, I changed the location of inputs. But it has still problem related to interpolation. fset.validation(Id_ARX, u_centered, y_centered, np.linspace(1,999,999), 1)

The reason that I need to use a method-function for predicting new output is to optimize manipulated variables.. can fset.validation be used like sklearn's predict method?


ValueError Traceback (most recent call last) Cell In[126], line 1 ----> 1 fset.validation(Id_ARX, u_centered, y_centered, np.linspace(1,999,999), 1)

File ~\Desktop\Python\SysID\sippy\functionset.py:192, in validation(SYS, u, y, Time, k, centering) 189 for i in range(ydim): 190 # one-step ahead predictor 191 if k == 1: --> 192 T, Y_u = cnt.forced_response((1/SYS.H[i,0])*SYS.G[i,:], Time, u) 193 T, Y_y = cnt.forced_response(1 - (1/SYS.H[i,0]), Time, y[i,:] - y_rif[i]) 194 Yval[i,:] = (Y_u + np.atleast_2d(Y_y) + y_rif[i])

File ~\anaconda3\envs\SysIDBox\Lib\site-packages\control\timeresp.py:407, in forced_response(sys, T, U, X0, transpose, interpolate, squeeze) 403 dsys = (A, B, C, D, sys_dt) 405 # Use signal processing toolbox for the discrete time simulation 406 # Transpose the input to match toolbox convention --> 407 tout, yout, xout = sp.signal.dlsim(dsys, np.transpose(U), T, X0) 409 if not interpolate: 410 # If dt is different from sys.dt, resample the output 411 inc = int(round(dt / sys_dt))

File ~\anaconda3\envs\SysIDBox\Lib\site-packages\scipy\signal_ltisys.py:3556, in dlsim(system, u, t, x0) 3553 u = u[:, np.newaxis] 3555 u_dt_interp = interp1d(t, u.transpose(), copy=False, bounds_error=True) -> 3556 u_dt = u_dt_interp(tout).transpose() 3558 # Simulate the system 3559 for i in range(0, out_samples - 1):

File ~\anaconda3\envs\SysIDBox\Lib\site-packages\scipy\interpolate_polyint.py:80, in _Interpolator1D.call(self, x) 59 """ 60 Evaluate the interpolant 61 (...) 77 78 """ 79 x, x_shape = self._prepare_x(x) ---> 80 y = self._evaluate(x) 81 return self._finish_y(y, x_shape)

File ~\anaconda3\envs\SysIDBox\Lib\site-packages\scipy\interpolate_interpolate.py:755, in interp1d._evaluate(self, x_new) 753 y_new = self._call(self, x_new) 754 if not self._extrapolate: --> 755 below_bounds, above_bounds = self._check_bounds(x_new) 756 if len(y_new) > 0: 757 # Note fill_value must be broadcast up to the proper size 758 # and flattened to work here 759 y_new[below_bounds] = self._fill_value_below

File ~\anaconda3\envs\SysIDBox\Lib\site-packages\scipy\interpolate_interpolate.py:784, in interp1d._check_bounds(self, x_new) 782 if self.bounds_error and below_bounds.any(): 783 below_bounds_value = x_new[np.argmax(below_bounds)] --> 784 raise ValueError("A value ({}) in x_new is below " 785 "the interpolation range's minimum value ({})." 786 .format(below_bounds_value, self.x[0])) 787 if self.bounds_error and above_bounds.any(): 788 above_bounds_value = x_new[np.argmax(above_bounds)]

ValueError: A value (0.0) in x_new is below the interpolation range's minimum value (1.0).


Regards

Selma Cho

SelmaCho commented 8 months ago

I found the answer about question 1). tf = (0.756 z^2 + 0.08442 z + 0.1733) / (z^5 - 0.2474 z^4 - 0.1489 z^3 - 0.1318 z^2)

y[k]-0.2474y[k-5]-0.1489y[k-4]-0.1318y[k-3] = 0.1733u[k-1]+0.08442u[k-2]+0.756u[k-3]

Really. thanks.

RBdC commented 8 months ago

Hi SelmaCho,

about Q1, you are close, but the right solution is as follows:

The general difference equation is: yk + a1 yk- 1 + ...  + ana yk-na = b1 uk-1-theta + .... + bnb uk-nb-theta

here you have from tf = (0.756 z^2 + 0.08442 z + 0.1733) / (z^5 - 0.2474 z^4 - 0.1489 z^3 - 0.1318 z^2) y[k]-0.2474y[k-1]-0.1489y[k-2]-0.1318y[k-3] = 0.756u[k-1-2]+0.08442u[k-2-2]+0.1733u[k-3-2] since you set ARX_orders=[3,3,2], you have 3 coef.s in NUM (na =3) and 3 coefs. in DEN (nb =3) , and theta (the time-delay) is indeed 2


about Q2, I run the code, and I did get no errors.. (e.g., Fit accuracy= 0.8843802125888518) I see ..."control\timeresp.py", so guess your error comes from an "old issue".. Please refer to this topic: https://github.com/CPCLAB-UNIPI/SIPPY/issues/42 " we noted that the cnt.lsim function changed its input format depending on which version you have. Testing EX_ARMAX and EX_ARMAX_MIMO both on control versions 0.8.3 and 0.9.0 it worked without errors. "

Let us know which version of control you have (control.version) and if updating it solves the problem.

Best RBdC

SelmaCho commented 8 months ago

Dear RBdC Thank you for your kindly reply.. About Q2, I re-installed control 0.9.0 but it still has problem. Also, control 0.8.3 version has a lot of problem like numpy integer problem and compatibility. It seems Scipy's error since error appears on scipy interpolate problem.

Regards Selma

RBdC commented 8 months ago

Dear Selma, thanks for your specification. Yeah, control is a bad beast in terms of compatibility. Consider it works for me with control 0.8.3 and scipy 1.4.1 Best RBdC

SelmaCho commented 8 months ago

Dear RBdC

Thank you. Another example does not show an error when using control version 0.9.0 and scipy version 1.11.1.

I have one more question.

What is the role of "Time" in fset.validation? The explanation in document is quite confusing, since it has k which delineate time-delay. Is "Time" to allocate time stamp to every samples?

Do I correctly understand? for example, we have u and y which are of 100 samples, respectively. Time should be np.linspace(0,99,100). Then, from the start point, samples of u and y have time stamp (0-th, 1-th, .... 99-th). Running fset.validation(ARX, u, y, np.linspace(0,99,100), k=1) results predicted y (0+k-th, 1+k-th, .. 99-k-th).

P.S. Here is a code and data which does not cause an error. IndustrialFiredHeater_SISO_Validation.csv IndustrialFiredHeater_SISO.csv

--

import packages

import matplotlib.pyplot as plt, numpy as np, control from sklearn.preprocessing import StandardScaler from sippy import system_identification as SysID from statsmodels.tsa.stattools import ccf from sippy import functionset as fset

package settings

plt.rcParams.update({'font.size': 14})

assert(control.version < '0.9'), "To avoid errors, downgrade the control package to a version < 0.9.0. See https://github.com/CPCLAB-UNIPI/SIPPY/issues/48 for details."

read fired heater identification test data and plot

data = np.loadtxt('IndustrialFiredHeater_SISO.csv', delimiter=',') FG = data[:,0, None] TO = data[:,1, None]

plots

plt.figure(figsize=(6,2.5)), plt.plot(FG, 'steelblue', linewidth=0.8) plt.ylabel('u(k)'), plt.xlabel('k'), plt.xlim(0) plt.grid(which='both', axis='y', linestyle='--')

plt.figure(figsize=(6,2.5)), plt.plot(TO, 'g', linewidth=0.8) plt.ylabel('y(k)'), plt.xlabel('k'), plt.xlim(0) plt.grid(which='both', axis='y', linestyle='--') plt.show()

center data before model fitting

u_scaler = StandardScaler(with_std=False) FG_centered = u_scaler.fit_transform(FG)

y_scaler = StandardScaler(with_std=False) TO_centered = y_scaler.fit_transform(TO)

fit ARX model using AIC

ARXmodel = SysID(TO_centered, FG_centered, 'ARX', IC='AIC', na_ord=[1,20], nb_ord=[1,20], delays=[0,5]) print(ARXmodel.G)

read fired heater validation data and plot

data_val = np.loadtxt('IndustrialFiredHeater_SISO_Validation.csv', delimiter=',') FG_val, TO_val = data_val[:,0, None], data_val[:,1, None]

plots

plt.figure(figsize=(6,2.5)) plt.plot(FG_val, 'steelblue', linewidth=0.8) plt.ylabel('u(k) validation'), plt.xlabel('k'), plt.xlim(0) plt.grid(which='both', axis='y', linestyle='--')

plt.figure(figsize=(6,2.5)) plt.plot(TO_val, 'g', linewidth=0.8) plt.ylabel('y(k) validation'), plt.xlabel('k'), plt.xlim(0) plt.grid(which='both', axis='y', linestyle='--') plt.show()

center validation data

FG_val_centered, TO_val_centered = u_scaler.fit_transform(FG_val), y_scaler.fit_transform(TO_val)

get model's 1-step ahead and 5-step ahead predictions, and simulation responses on validation dataset

from sippy import functionset as fset

TO_val_predicted_1step_centered = np.transpose(fset.validation(ARXmodel, FG_val_centered, TO_val_centered, np.linspace(0,299,300), k=1)) TO_val_predicted_5step_centered = np.transpose(fset.validation(ARXmodel, FG_val_centered, TO_val_centered, np.linspace(0,299,300), k=5)) TO_val_simulatedcentered, , _ = control.matlab.lsim(ARXmodel.G, FG_val_centered[:, 0], np.linspace(0,299,300))

TO_val_predicted_1step = y_scaler.inverse_transform(TO_val_predicted_1step_centered) TO_val_predicted_5step = y_scaler.inverse_transform(TO_val_predicted_5step_centered) TO_val_simulated = y_scaler.inverse_transform(TO_val_simulated_centered[:, None])

plt.figure(figsize=(18,7.5)) plt.plot(TO_val, 'g', linewidth=0.8, label='Measurements') plt.plot(TO_val_predicted_1step, 'r', linewidth=0.8, label='1-step ahead predictions') plt.plot(TO_val_predicted_5step, 'c', linewidth=0.8, label='5-step ahead predictions') plt.plot(TO_val_simulated, 'm', linewidth=0.8, label='Simulation response') plt.title('Validation data') plt.ylabel('y(k): Measured vs predicted/simulated'), plt.xlabel('k') plt.legend(), plt.xlim(0) plt.grid(which='both', axis='y', linestyle='--') plt.show()

Best regards.

Selma

RBdC commented 7 months ago

Dear Selma, thank you for your new post, the data, and the code. I ran it and worked ok.

What is the role of "Time" in fset.validation? The explanation in document is quite confusing, since it has k which delineate time-delay. Is "Time" to allocate time stamp to every samples? --> Time is indeed the time vector to define the prediction of your model.

Do I correctly understand? for example, we have u and y which are of 100 samples, respectively. Time should be np.linspace(0,99,100). Then, from the start point, samples of u and y have time stamp (0-th, 1-th, .... 99-th). Running fset.validation(ARX, u, y, np.linspace(0,99,100), k=1) results predicted y (0+k-th, 1+k-th, .. 99-k-th). --> Yes, you predict the output of the model k-steps ahead.

Best RBdC