VowpalWabbit / vowpal_wabbit

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
https://vowpalwabbit.org
Other
8.49k stars 1.92k forks source link

Quantile Loss for support vector regression or at least for usual linear regression #3065

Open Sandy4321 opened 3 years ago

Sandy4321 commented 3 years ago

Description

A brief description of the error, missing documentation or what you would like added it is not clear how to find explanations/ example how to code in python Quantile Loss for support vector regression or at least for usual linear regression

Link to Documentation Page

Where is the documentation in question? https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Loss-functions

lokitoth commented 3 years ago

Hi @Sandy4321 thanks for filing this issue. Could you expand a bit on what you would like to see here? Is the question about how to enable "quantile" loss when using VW in Python, or is it something else?

Sandy4321 commented 3 years ago

I ask description and more details how to do Quantile Loss for support vector regression or at least for usual linear regression

or at least code example for python pls

lokitoth commented 3 years ago

Sorry, I am still a bit confused about the specific question here. Switching the loss function to "quantile" (or others) in Python is done the same way as setting any command-line argument:

model = pyvw.vw(loss_function="quantile")

Is the question about better documentation for how to configure various vw options in Python? Or is it about how to think about Quantile Regression in general?

Sandy4321 commented 3 years ago

yes the question is better documentation for how to configure various vw options in Python?

it would be great to have full example from start to end for python quantile regression for example given such data file python code to use is :.....

predicted data is:....

mean absolute error is .... confidence intervals are: ....

since always something is not clear in general form description

some efforts done in this direction for example https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/vowpalwabbit.pyvw.html

from vowpalwabbit import pyvw vw1 = pyvw.vw('--audit') vw2 = pyvw.vw(audit=True, b=24, k=True, c=True, l2=0.001) vw3 = pyvw.vw("--audit", b=26) vw4 = pyvw.vw(q=["ab", "ac"])

but it would be really great to have full python code example

thanks a lot for taking care

Sandy4321 commented 3 years ago

I was able to find very limited examples in python this one https://vowpalwabbit.org/tutorials/python_first_steps.html is very concise

Sandy4321 commented 3 years ago

at least something like this , but for quantile regression

https://pypi.org/project/vowpalwabbit/

import numpy as np from sklearn import datasets from sklearn.model_selection import train_test_split from vowpalwabbit.sklearn_vw import VWClassifier

generate some data

X, y = datasets.make_hastie_10_2(n_samples=10000, random_state=1) X = X.astype(np.float32)

split train and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=256)

build model

model = VWClassifier() model.fit(X_train, y_train)

predict model

y_pred = model.predict(X_test)

evaluate model

model.score(X_train, y_train) model.score(X_test, y_test)

Sandy4321 commented 3 years ago

by the way in this link https://pypi.org/project/vowpalwabbit/ there is line at the bottom python/examples : example python code and jupyter notebooks to demonstrate functionality

may you clarify how to find this folder?

Sandy4321 commented 3 years ago

all right
I found this folder

then it would be great to share example for quantile regression in this style https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/python/examples/poisson_regression.ipynb

lalo commented 3 years ago

Yes, it is not entirely clear. It is referring to the dirs on that same location as that text file in the repository, which makes it extra confusing on that pypy.org documentation. For a slightly better experience, you can see those docs over here: https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/python

The readme is in vowpal_wabbit/python/README.rst https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/python

The python/examples would be in vowpal_wabbit/python/examples https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/python/examples

Tests: https://github.com/VowpalWabbit/vowpal_wabbit/tree/master/python/tests

lalo commented 3 years ago

We also have these autogen docs: https://vowpalwabbit.org/docs/ https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/

lokitoth commented 3 years ago

In the SciKit case, the configuration options are passed in the same way as for pyvw:

So, if you want to have VWClassifier run with quantile loss, you would specify:

classifier_model = VWClassifier(loss_function='quantile')

#or

regressor_model = VWRegressor(loss_function='quantile')

Here is a deep link for VWClassifier and one for VWRegressor to the class documentation

I suspect that we probably will not make a specific tutorial for just quantile loss because it seems like there would be a lot of tutorials that only differ from one-another by the specific combination of options they use. Would a general tutorial about how to pass options to VW when using in Python in pyvw / scikit modes make sense here @Sandy4321, or alternatively a tutorial that explores the various things you can do in the context of regression specifically?

Sandy4321 commented 3 years ago

in this link https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/vowpalwabbit.sklearn.html#vowpalwabbit.sklearn_vw.VWRegressor I see no example though there is example for classifier

Sandy4321 commented 3 years ago

Would a general tutorial about how to pass options to VW when using in Python in pyvw / scikit modes make sense here @Sandy4321, or alternatively a tutorial that explores the various things you can do in the context of regression specifically?

yes would be great to have one

for example in https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/python/tests/test_sklearn_vw.py

def test_lrq(self):
    X = ['1 |user A |movie 1',
         '2 |user B |movie 2',
         '3 |user C |movie 3',
         '4 |user D |movie 4',
         '5 |user E |movie 1']
    model = VW(convert_to_vw=False, lrq='um4', lrqdropout=True, loss_function='quantile')
    assert getattr(model, 'lrq') == 'um4'
    assert getattr(model, 'lrqdropout')
    model.fit(X)
    prediction = model.predict([' |user C |movie 1'])
    assert np.allclose(prediction, [3.], atol=1)

it is not clear at all about lrq='um4' why um4 , what is it um4 and how to find answer on this kind of questions for people who is not familiar with VW but only starting to learn VW

it is difficult to make google search for meaning for lrq since it is only 3 letters

Sandy4321 commented 3 years ago

even stackoverflow can not help https://stackoverflow.com/questions/44298795/one-time-vs-iteration-model-in-vowpal-wabbit-with-lrq-option

lokitoth commented 3 years ago

why um4 , what is it um4 and how to find answer on this kind of questions for people who is not familiar with VW but only starting to learn VW

The command-line options documentation is fairly sparse here, but here are some links to get you started with LRQ:

Putting together a more coherent list of issues that can be addressed from this:

Sandy4321 commented 3 years ago

great thanks for help

then lrq='um4' is not related to loss_function='quantile' in model = VW(convert_to_vw=False, lrq='um4', lrqdropout=True, loss_function='quantile')

my guess also regularization or L1 or L2 may be added to this line model = VW(convert_to_vw=False, lrq='um4', lrqdropout=True, loss_function='quantile') ? similar to --l2 use in

``

@${VW} --loss_function quantile -l 0.1 -b 24 --passes 100 \ -k --cache_file $@.cache -d $(word 2,$+) --holdout_off \ --power_t 0.333 --l2 1.25e-7 --lrq um7 --adaptive --invariant -f $@.model

``

In general VW is really great package !!! but for python users would be crucial to have examples from start to end coded in python starting from reading data from file and ending by performance quality demonstration

since for python coder understanding make file like https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/demo/movielens/Makefile is impossible to do task..