Closed Dawntown closed 4 years ago
Hi there, thanks for your report. That error is strictly related to CVXOPT. It occurs when there is a numerical problem during optimization (e.g. the combined kernel is not s.d.p. or something else), and it may depend on the parameters you have.
I tried the following simple code
from MKLpy.algorithms import EasyMKL
from sklearn.metrics.pairwise import sigmoid_kernel
from sklearn.datasets import load_iris as load
data = load()
X, Y = data.data, data.target
KL = [sigmoid_kernel(X, gamma=g) for g in [0.01, 0.1, 1.0]]
clf = EasyMKL(solver='libsvm').fit(KL,Y)
clf = EasyMKL().fit(KL,Y)
without raising the error.
What can we do to solve this issue?
EasyMKL(solver='libsvm')
Thank you for your advice. I have tried different kernel combinations and parameters of EasyMKL, and did found certain parameters would trigger this issue.
Firstly, I defined four kernels with fixed parameters and parameters searching space.
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from MKLpy.algorithms import EasyMKL
from MKLpy.preprocessing import kernel_normalization
from sklearn.metrics import pairwise
def linear_k(X):
return pairwise.linear_kernel(X)
def rbf_k(X):
return pairwise.rbf_kernel(X, gamma=1/X.shape[1])
def poly_k(X):
return pairwise.polynomial_kernel(X, degree=3)
def sigm_k(X):
return pairwise.sigmoid_kernel(X, gamma=1/X.shape[1])
param_grid = {
'lam': np.linspace(0,1,11),
'C': np.logspace(-4,2,7)
}
normalizer = StandardScaler()
When using iris dataset from sklearn
and a kernel combination of all the four kernels, all parameters work well:
X, y = load().data, load().target
X = normalizer.fit_transform(X)
kernel_list = [linear_k, rbf_k, poly_k, sigm_k]
KL_comb = [k(X) for k in kernel_list]
KL_norm = [kernel_normalization(K) for K in KL_comb]
for lam in param_grid['lam']:
for C in param_grid['C']:
try:
clf = EasyMKL(lam=lam, learner=SVC(C=C), solver='libsvm').fit(KL_comb_ori, y)
except ValueError as e:
print("lam: {:.1f}, C: {}\t{}".format(lam, C, e))
When using iris dataset from sklearn
and a sigmoid kernel, lam=0
fails with ValueError
kernel_list = [sigm_k]
lam: 0.0, C: 0.0001 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 0.001 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 0.01 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 0.1 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 1.0 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 10.0 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 100.0 Rank(A) < p or Rank([P; A; G]) < n
When using my own dataset, which contains 7 groups of features, with less than 200 features and more than 1000 samples in total, it takes a very long time, like tens of minutes, to train. As a result, I just have to kill them. What is really odd is that the bad parameter settings seemed to occur randomly if I didn't set random seed. In most cases, however, training of a long time occurred with a relatively small lam
, whenever I use kernels combination or single sigmoid kernel. Under this circumstance, if I remove the option of solver='libsvm'
, it will pop out ValueError: Rank(A) < p or Rank([P; A; G]) < n
.
Ok, so basically the problem is related to the non-separability of the data. Let me explain.
The effect of Lambda is similar to the C of the SVM. Both of them help to solve non-separable problems. lambda=0
means C= infinity
.
If you force lambda=0
and your problem is not separable, you may encounter that error (rank(A) ....) when using the default solver (CVXOPT) because the optimization problem basically has infinite solutions (see EasyMKL paper for further details) and you cannot have control over it.
When you use libsvm as solver, you do not have this issue during the optimization as you're using SMO instead of the interior point method. However, a well-known issue with SMO is that it is really hard to reach the convergence when the problem is hardly separable. As a consequence, you have a really long waiting time.
So, basically you're trying to play with a hard-margin method, but your problem strictly needs a soft-margin solution, so you just need to set lambda>0. Alternatively, you may consider using an identity matrix as an additional kernel. This usually alleviates the problem.
However, thanks for your report, this helps us to improve error handling in the next release.
Your explanation was really appreciated! It is reasonable and clear.
I tried combined kernels or single kernel on my dataset. Once using combined kernels of the sigmoid kernel with any non-linear kernel, such as rbf, poly kernel, or both of them, even though with much larger lam
and default solver, the training process still converged very slowly... My data are probably not suited for combined kernels with the sigmoid kernel, so I am determined to kick it out.
Thank you very much.
Hi, thank you for your work on this useful tool! When I using the sigmoid kernel from scikit-learn package in
EasyMKL
, the error info ofValueError: Rank(A) < p or Rank([P; A; G]) < n
showed up. I am wondering whether the algorithm supports sigmoid kernel, which is not offered inMKLpy.pairwise
, or not.If the entire error information is needed, you can refer to this link.