Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
A user should use signed integer types and not unsigned integer types when passing to the sklearn adapter functions.
Details
The tovw function uses dump_svmlight_file to convert to a format that can easily construct VW text examples.
This function does not support input of unsigned integers, it requires signed due to the pyx code internally in sklearn.
Fails:
from vowpalwabbit.sklearn import VWRegressor
import numpy as np
import pandas as pd
X = pd.DataFrame({'a': [1]}, dtype='uint32')
y = pd.Series(np.zeros(1))
VWRegressor().fit(X, y)
Succeeds:
from vowpalwabbit.sklearn import VWRegressor
import numpy as np
import pandas as pd
X = pd.DataFrame({'a': [1]}, dtype='int32') # <-----
y = pd.Series(np.zeros(1))
VWRegressor().fit(X, y)
The same input works when passed to SKLearn itself:
from sklearn.linear_model import LinearRegression
import numpy as np
import pandas as pd
X = pd.DataFrame({'a': [1]}, dtype='uint32')
y = pd.Series(np.zeros(1))
LinearRegression().fit(X, y)
To fix this one way is to avoid using the dump_svmlight_file function. It is used currently as a way to easily convert the dataframe to vw text format.
Mitigation
A user should use signed integer types and not unsigned integer types when passing to the
sklearn
adapter functions.Details
The
tovw
function usesdump_svmlight_file
to convert to a format that can easily construct VW text examples.This function does not support input of unsigned integers, it requires signed due to the
pyx
code internally in sklearn.Fails:
Succeeds:
The same input works when passed to SKLearn itself:
To fix this one way is to avoid using the
dump_svmlight_file
function. It is used currently as a way to easily convert the dataframe to vw text format.