JuliaML / LIBLINEAR.jl

LIBLINEAR bindings for Julia
Other
12 stars 10 forks source link

support regression models #38

Open barucden opened 6 months ago

barucden commented 6 months ago

Fixes #37

This adds support for regression models. However, the models produced by liblinear do not seem to be very good.

For example, the following example in scikit-learn:

from sklearn.svm import LinearSVR
import numpy as np

X = np.random.rand(10000, 1)
y = (2 * X)[:, 0]
m = LinearSVR(loss='squared_epsilon_insensitive', dual=False, verbose=1, fit_intercept=False)
m.fit(X, y)

print(m.coef_)

prints

iter  1 act 1.332e+04 pre 1.332e+04 delta 2.000e+00 f 1.332e+04 |g| 1.332e+04 CG   1
[LibLinear][1.99969976]

(meaning the linear coefficient was found pretty accurately)

Whereas, the following code

using LIBLINEAR

X = rand(1, 10000)
y = vec(2 .* X)

m = linear_train(y, X, solver_type=LIBLINEAR.L2R_L2LOSS_SVR, verbose=true)

println(m.w)

prints

init f 3.334e+11 |g| 4.989e+07
iter  1 f 1.462e+11 |g| 1.004e+03 CG   2 step_size 1.00e+00
[7503.24992010474]

with the current PR (meaning totally inaccurate linear coefficient). According to my investigation, this is what is indeed returned by liblinear. Scikit seems to use a different solver than liblinear, but I am not sure if that's the only issue.

Also: linear_predict is not really type-stable as the output type depends on solver_type. For one-class SVM, the output is a pair of Vector{String} and Vector{Float64}. For regression models, it is Vector{Float64} and Vector{Float64} (I made it to return the same vector twice). For other models, it is Vector{typeof(labels)} and Vector{Float64}.