Closed siavashserver closed 6 years ago
@siavashserver, you can use bias_used=False
in class initialization. It helped me with the same exception when I used RVC. Unfortunately, I can't explain why it helped and what it will change in the algorithm abstractly. I just read the sources and found what the cause of exception.
So, it should work:
from skrvm import RVR
X = [[2.1, 4], [2, 2]]
y = [0.5, 2.5 ]
clf = RVR(kernel='linear', bias_used=False)
clf.fit(X, y)
print(clf.predict([[1, 1]]))
@Alpus Thank you very much for help, that indeed gets rid of the nasty error! :)
I just gave the code sample on home page with bias_used=False
a try and it returns 1.20
instead of 1.49
. No idea about its effect on more complex data and other kernels, should give it a try later.
For future Googlers, there is also https://github.com/AmazaspShumik/sklearn-bayes as an actively maintained alternative.
As for sklearn-bayes, the code is not compatible with Python 3.7.
In my experience, the error only arises if kernel='linear'
and if the dimension of features is equal to 1.
As mentioned above, the error is not triggered when bias_used
is set to False
.
Another solution would be to add a dummy dimension to the features.
For instance, the following code:
import matplotlib.pyplot as plt
import numpy as np
from skrvm import RVR
# parameters
n = 500
# generate data set
np.random.seed(0)
X = np.ones([n, 1])
X[:, 0] = np.linspace(-5, 5, n)
y = 10 * np.sinc(X[:, 0]) + np.random.normal(0, 1, n)
# train rvr
rvm = RVR()
rvm.fit(X, y)
y_hat = rvm.predict(X)
# plot test vs predicted data
plt.figure()
plt.plot(X[:, 0], y, "b+", markersize=3, label="test data")
plt.plot(X[:, 0], y_hat, "rD", markersize=3, label="mean of predictive distribution")
plt.show()
returns:
However, with a linear kernel:
import matplotlib.pyplot as plt
import numpy as np
from skrvm import RVR
# parameters
n = 500
# generate data set
np.random.seed(0)
X = np.ones([n, 1])
X[:, 0] = np.linspace(-5, 5, n)
y = 10 * np.sinc(X[:, 0]) + np.random.normal(0, 1, n)
# train rvr
rvm = RVR(kernel='linear')
rvm.fit(X, y)
y_hat = rvm.predict(X)
# plot test vs predicted data
plt.figure()
plt.plot(X[:, 0], y, "b+", markersize=3, label="test data")
plt.plot(X[:, 0], y_hat, "rD", markersize=3, label="mean of predictive distribution")
plt.show()
returns:
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by check_pairwise_arrays.
First solution:
import matplotlib.pyplot as plt
import numpy as np
from skrvm import RVR
# parameters
n = 500
# generate data set
np.random.seed(0)
X = np.ones([n, 1])
X[:, 0] = np.linspace(-5, 5, n)
y = 10 * np.sinc(X[:, 0]) + np.random.normal(0, 1, n)
# train rvr
rvm = RVR(kernel='linear', bias_used=False)
rvm.fit(X, y)
y_hat = rvm.predict(X)
# plot test vs predicted data
plt.figure()
plt.plot(X[:, 0], y, "b+", markersize=3, label="test data")
plt.plot(X[:, 0], y_hat, "rD", markersize=3, label="mean of predictive distribution")
plt.show()
returns
Second solution:
import matplotlib.pyplot as plt
import numpy as np
from skrvm import RVR
# parameters
n = 500
# generate data set
np.random.seed(0)
X = np.ones([n, 2]) # <--- I have added a dummy second feature dimension.
X[:, 0] = np.linspace(-5, 5, n)
y = 10 * np.sinc(X[:, 0]) + np.random.normal(0, 1, n)
# train rvr
rvm = RVR(kernel='linear') # <--- I have removed bias_used=False
rvm.fit(X, y)
y_hat = rvm.predict(X)
# plot test vs predicted data
plt.figure()
plt.plot(X[:, 0], y, "b+", markersize=3, label="test data")
plt.plot(X[:, 0], y_hat, "rD", markersize=3, label="mean of predictive distribution")
plt.show()
returns
Notice the slight slope of the line, likely due to the bias.
The error arises because of this check in scikit-learn
.
As mentioned in the documentation, arrays are expected to be at least 2-dimensional.
def check_pairwise_arrays(X, Y, precomputed=False, dtype=None):
""" Set X and Y appropriately and checks inputs
Specifically, this function first ensures that both X and Y are arrays,
then checks that they are at least two dimensional while ensuring that
their elements are floats (or dtype if provided). Finally, the function
checks that the size of the second dimension of the two arrays is equal, or
the equivalent check for a precomputed distance matrix.
@woctezuma I ended up writing my own implementation for my master's degree thesis: neonrvmSHAMELESS-SELF-ADVERTISEMENT
It's written in C programming language + Python bindings. And to speedup learning process, training data can be fed incrementally.
One major problem with these methods (SVM/RVM) is dealing with singular matrices during factorization; and there is hyperparameters with big search spaces to tune where a slight change as small as 1e-3 can make a big difference in model performance.
I found gradient boosted decision trees (XGBoost, LightGBM, ...) to be more reliable and easier to use while comparing RVM with different machine learning methods.
Thanks for the links!
Hi. Why am I getting this error:
with following code:
Am I using it correctly?