Does this implement of EasyMKL not support sigmoid kernel in `sklearn.pairwise`?

Dawntown commented 4 years ago

Hi, thank you for your work on this useful tool! When I using the sigmoid kernel from scikit-learn package in EasyMKL, the error info of ValueError: Rank(A) < p or Rank([P; A; G]) < n showed up. I am wondering whether the algorithm supports sigmoid kernel, which is not offered in MKLpy.pairwise, or not.

If the entire error information is needed, you can refer to this link.

IvanoLauriola commented 4 years ago

Hi there, thanks for your report. That error is strictly related to CVXOPT. It occurs when there is a numerical problem during optimization (e.g. the combined kernel is not s.d.p. or something else), and it may depend on the parameters you have.

I tried the following simple code

from MKLpy.algorithms import EasyMKL
from sklearn.metrics.pairwise import sigmoid_kernel
from sklearn.datasets import load_iris as load

data = load()
X, Y = data.data, data.target
KL = [sigmoid_kernel(X, gamma=g) for g in [0.01, 0.1, 1.0]]

clf = EasyMKL(solver='libsvm').fit(KL,Y)
clf = EasyMKL().fit(KL,Y)

without raising the error.

What can we do to solve this issue?

check your code and try to use the libsvm solver EasyMKL(solver='libsvm')
submit a complete bug report with your code so I can replicate the error (see here)

Dawntown commented 4 years ago

Thank you for your advice. I have tried different kernel combinations and parameters of EasyMKL, and did found certain parameters would trigger this issue.

Firstly, I defined four kernels with fixed parameters and parameters searching space.

from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from MKLpy.algorithms import EasyMKL
from MKLpy.preprocessing import kernel_normalization
from sklearn.metrics import pairwise

def linear_k(X):
    return pairwise.linear_kernel(X)

def rbf_k(X):
    return pairwise.rbf_kernel(X, gamma=1/X.shape[1])

def poly_k(X):
    return pairwise.polynomial_kernel(X, degree=3)

def sigm_k(X):
    return pairwise.sigmoid_kernel(X, gamma=1/X.shape[1])

param_grid = {
        'lam': np.linspace(0,1,11),
        'C': np.logspace(-4,2,7)
}

normalizer =  StandardScaler()

When using iris dataset from sklearn and a kernel combination of all the four kernels, all parameters work well:

X, y = load().data, load().target
X = normalizer.fit_transform(X)
kernel_list = [linear_k, rbf_k, poly_k, sigm_k]
KL_comb = [k(X) for k in kernel_list]
KL_norm = [kernel_normalization(K) for K in KL_comb]
for lam in param_grid['lam']:
    for C in param_grid['C']:
        try:
            clf = EasyMKL(lam=lam, learner=SVC(C=C), solver='libsvm').fit(KL_comb_ori, y)
        except ValueError as e:
            print("lam: {:.1f}, C: {}\t{}".format(lam, C, e))

When using iris dataset from sklearn and a sigmoid kernel, lam=0 fails with ValueError

kernel_list = [sigm_k]

lam: 0.0, C: 0.0001 Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 0.001  Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 0.01   Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 0.1    Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 1.0    Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 10.0   Rank(A) < p or Rank([P; A; G]) < n
lam: 0.0, C: 100.0  Rank(A) < p or Rank([P; A; G]) < n

When using my own dataset, which contains 7 groups of features, with less than 200 features and more than 1000 samples in total, it takes a very long time, like tens of minutes, to train. As a result, I just have to kill them. What is really odd is that the bad parameter settings seemed to occur randomly if I didn't set random seed. In most cases, however, training of a long time occurred with a relatively small lam, whenever I use kernels combination or single sigmoid kernel. Under this circumstance, if I remove the option of solver='libsvm', it will pop out ValueError: Rank(A) < p or Rank([P; A; G]) < n.

IvanoLauriola commented 4 years ago

Ok, so basically the problem is related to the non-separability of the data. Let me explain.

The effect of Lambda is similar to the C of the SVM. Both of them help to solve non-separable problems. lambda=0 means C= infinity. If you force lambda=0 and your problem is not separable, you may encounter that error (rank(A) ....) when using the default solver (CVXOPT) because the optimization problem basically has infinite solutions (see EasyMKL paper for further details) and you cannot have control over it. When you use libsvm as solver, you do not have this issue during the optimization as you're using SMO instead of the interior point method. However, a well-known issue with SMO is that it is really hard to reach the convergence when the problem is hardly separable. As a consequence, you have a really long waiting time.

So, basically you're trying to play with a hard-margin method, but your problem strictly needs a soft-margin solution, so you just need to set lambda>0. Alternatively, you may consider using an identity matrix as an additional kernel. This usually alleviates the problem.

However, thanks for your report, this helps us to improve error handling in the next release.

Dawntown commented 4 years ago

Your explanation was really appreciated! It is reasonable and clear.

I tried combined kernels or single kernel on my dataset. Once using combined kernels of the sigmoid kernel with any non-linear kernel, such as rbf, poly kernel, or both of them, even though with much larger lam and default solver, the training process still converged very slowly... My data are probably not suited for combined kernels with the sigmoid kernel, so I am determined to kick it out.

Thank you very much.

IvanoLauriola / MKLpy

Does this implement of EasyMKL not support sigmoid kernel in `sklearn.pairwise`? #19