how to handle multi-classification using one-vs-rest method?

cjlin1 / liblinear

LIBLINEAR -- A Library for Large Linear Classification

https://www.csie.ntu.edu.tw/~cjlin/liblinear/

BSD 3-Clause "New" or "Revised" License

1.01k stars 342 forks source link

how to handle multi-classification using one-vs-rest method? #21

Open jm-huang opened 8 years ago

jm-huang commented 8 years ago

I am a little confusing while using this package for multi-classification. can anyone tell me how to do it ? Thanks.

what i had try:

train_labels=[[1,2], [2], [3]]
train_datas = [[1,1,0], [1,2,2], [1,1,1]]
prob = problem(train_labels, train_datas)
param = parameter('-s 0')
model = train(prob, param)

but it arise some errors: Traceback (most recent call last): File "C:\Users\Jiaming\Dropbox\Internship in ADSC\DeepWalk\experiments\classifier.py", line 69, in process prob = problem(train_labels, train_datas) File "C:\Users\Jiaming\Anaconda2\lib\site-packages\liblinear-210-py2.7.egg\liblinear\liblinear.py", line 107, in init for i, yi in enumerate(y): self.y[i] = y[i] TypeError: a float is required

cjlin1 commented 8 years ago

you don't need to handle it. Liblinear directly supports 1-vs-rest

Billy writes:

I am a little confusing while using this package for multi-classification. can anyone tell me how to do it ? Thanks.

what i had try:

train_labels=[[1,2], [2], [3]] train_datas = [[1,1,0], [1,2,2], [1,1,1]] prob = problem(train_labels, train_datas) param = parameter('-s 0') model = train(prob, param)

but it arise some errors: Traceback (most recent call last): File "C:\Users\Jiaming\Dropbox\Internship in ADSC\DeepWalk\ experiments\classifier.py", line 69, in process prob = problem(train_labels, train_datas) File "C:\Users\Jiaming\Anaconda2\lib\site-packages\ liblinear-210-py2.7.egg\liblinear\liblinear.py", line 107, in init for i, yi in enumerate(y): self.y[i] = y[i] TypeError: a float is required

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub*

jm-huang commented 8 years ago

@cjlin1 I think that I am using the wrong data format, can you show me the format of the train data file ? or the correct format of "problem"? Thanks

jm-huang commented 8 years ago

@cjlin1 That's how can i using train() method in python to do that?

jm-huang commented 8 years ago

can you kindly tell me how can i set the format of "train_labels" with multi-class ? what i did make wrong results. Thanks very much. train_labels=[[1,2], [2], [3]] train_datas = [[1,1,0], [1,2,2], [1,1,1]] prob = problem(train_labels, train_datas) param = parameter('-s 0') model = train(prob, param)

jm-huang commented 8 years ago

I using this format: the first part is the labels, and the second part is the features. train.txt: 1,2,3 3 4 5 2 4 5 5 3 3 4 5

train train.txt and it output: Wrong input format at line 1 I just don't know how to handle this error.

rofuyu commented 7 years ago

You have a multi-label dataset (more than 1 positive labels for each instance) instead of a mulit-class dataset. The current liblinear only supports multi-class classification. If the number of labels in your case is small, you can re-label them using the index to the power set: e.g., {1} -> 1 {2} -> 2 {3} -> 3 {1,2} -> 4 {1,3} -> 5 {2,3} -> 6 {1,2,3} ->7.

simsong commented 6 years ago

This issue was moved to angleto/liblinear#15