What is the correct way handle with multidimensional array?

kapkirl commented 5 years ago

Trying to use sample with nonlinear regression

import pandas
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt

# # measurements
xm = np.array([[80435, 33576, 3930495], [63320, 21365, 2515052],
[131294, 46680, 10339497], [64470, 29271, 3272846],
[23966, 7973, 3450144], [19863, 11429, 3427307],
[32139, 13114, 2462822], [78976, 26973, 5619715],
[32857, 10455, 3192817], [29400, 12808, 3665615],
[4667, 2876, 2556650], [21477, 10349, 6005812],
[9168, 4617, 2878631], [385112, 127609, 4063576],
[55522, 29954, 3632023], [155, 197, 507],
[160, 106, 336], [25, 23, 669], [86, 96, 751], [199, 235, 515],
[60, 83, 511], [8, 25, 187], [32, 59, 679], [11, 22, 365],
[322, 244, 2001], [172, 229, 1110], [41, 48, 447], [109, 144, 2386],
[23, 27, 319], [105, 204, 672], [77, 77, 2]])

ym = np.array([90,85,91,90,90,82,81,85,83,83,72,78,
74,92,90,28,26,13,12,22,25,5,10,15,50,54,4,28,10,7,6])

# GEKKO model
m = GEKKO()

# parameters
x = m.Param(value=xm, name='X')
y = m.CV(value=ym)
y.FSTATUS = 1

a1 = m.FV()
a1.STATUS=1

a2 = m.FV()
a2.STATUS=1

a3 = m.FV()
a3.STATUS=1

# regression equation
for i in range(len(x)):
    m.Equation(
        y[i] == np.log10(x[i][0]) * a1 +
                np.log10(x[i][1]) * a2 +
                np.log10(x[i][2]) * a3)

# regression mode
m.options.IMODE = 2

# optimize
m.solve(disp=False, GUI=False)

# print parameters
print('Optimized, a = ', str(a1), str(a2), str(a3))

plt.plot(y.value, ym, 'bo')
# plt.plot(xm, y.value, 'r-')
plt.show()

gives error

File "/usr/local/lib/python3.6/dist-packages/gekko/gekko.py", line 1830, in solve self._write_csv() File "/usr/local/lib/python3.6/dist-packages/gekko/gk_write_files.py", line 184, in _write_csv raise Exception('Data arrays must have the same length, and match time discretization in dynamic problems') Exception: Data arrays must have the same length, and match time discretization in dynamic problems

How can I handle this case?

APMonitor commented 5 years ago

Here is a summary of the modifications:

Use m.log10 instead of np.log10
Define x as an Array and load each column (e.g. xm[:,0]) into the x[0].value separately
Define the equation only once, not multiple times for each data row. IMODE=2 is efficient for large data sets this way because the equation is only defined one and the data points are all evaluated with that same expression.
Added red line to plot
Print a[i].value[0] to display the numeric solution

results

import pandas
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt

# # measurements
xm = np.array([[80435, 33576, 3930495], [63320, 21365, 2515052],
[131294, 46680, 10339497], [64470, 29271, 3272846],
[23966, 7973, 3450144], [19863, 11429, 3427307],
[32139, 13114, 2462822], [78976, 26973, 5619715],
[32857, 10455, 3192817], [29400, 12808, 3665615],
[4667, 2876, 2556650], [21477, 10349, 6005812],
[9168, 4617, 2878631], [385112, 127609, 4063576],
[55522, 29954, 3632023], [155, 197, 507],
[160, 106, 336], [25, 23, 669], [86, 96, 751], [199, 235, 515],
[60, 83, 511], [8, 25, 187], [32, 59, 679], [11, 22, 365],
[322, 244, 2001], [172, 229, 1110], [41, 48, 447], [109, 144, 2386],
[23, 27, 319], [105, 204, 672], [77, 77, 2]])

ym = np.array([90,85,91,90,90,82,81,85,83,83,72,78,
74,92,90,28,26,13,12,22,25,5,10,15,50,54,4,28,10,7,6])

# GEKKO model
m = GEKKO(remote=False)

# parameters
n = np.size(xm,1)
x = m.Array(m.Param,n)
for i in range(n):
    x[i].value = xm[:,i]
y = m.CV(value=ym)
y.FSTATUS = 1

a1 = m.FV()
a1.STATUS=1

a2 = m.FV()
a2.STATUS=1

a3 = m.FV()
a3.STATUS=1

# regression equation
m.Equation(y == m.log10(x[0]) * a1 + \
                m.log10(x[1]) * a2 + \
                m.log10(x[2]) * a3)

# regression mode
m.options.IMODE = 2

# optimize
m.solve(disp=True, GUI=False)

# print parameters
print('Optimized, a = ', str(a1.value.value[0]), str(a2.value[0]), str(a3.value[0]))

plt.plot(y.value, ym, 'bo')
plt.plot([0,max(ym)],[0,max(ym)],'r-')
plt.show()

APMonitor commented 5 years ago

This is a good question for StackOverflow. Could you post it there with tag [gekko]?

https://stackoverflow.com/questions/tagged/gekko

kapkirl commented 5 years ago

Thank you! Done https://stackoverflow.com/questions/57726954/what-is-the-correct-way-handle-with-multidimensional-array-in-gekko-nonlinear-re/57727056#57727056

jpatria commented 3 years ago

@APMonitor , I'd like to perform a linear regression with the same dataset as above, but when I do so, my adapted code gives me all 0s for my coefficients. i.e. i want to solve y==b + b1x1 + b2x2 + b3*x3 ...

Would you be able to clarify the correct way of formulating the equation and setting up the FV?

APMonitor commented 3 years ago

Here are a few example problems for regression: https://apmonitor.com/che263/index.php/Main/PythonDataRegression If you don't have constraints on the coefficients then there are many options in Python besides Gekko: https://github.com/APMonitor/data_science/blob/master/06.%20Regression.ipynb Here are a few more examples in Gekko: http://apmonitor.com/do/index.php/Main/DynamicEstimation (see examples 3 and 4).

BYU-PRISM / GEKKO

What is the correct way handle with multidimensional array? #69