devarajphukan / Multi-Class-Text-Classifier-HackerRank

Solution to a Hacker-Rank Machine Learning Question (Document Classifier)
11 stars 3 forks source link

how to specify the probability? #1

Open marn65 opened 6 years ago

marn65 commented 6 years ago

hi, this code works great for me,thank you. But I have a question: How can I specify the probability of belonging a sentence to each class ? for example I test this sentence: "this is a book" and the classifier says it belongs to class=1. but I want to know something like this" it belongs to class=1 with 76.44 it belongs to class=2 with 6.44 it belongs to class=3 with 7.47 . . .

could you please help me.

devarajphukan commented 6 years ago

@marn65 The classifier used is LinearSVC which you can see in line 45, it is from a family of svm classifiers which uses a decision function to get the class rather than a probability distribution to get the most likely class, to get the probability from such family of classifiers you can use calibratedclassifierCV, from sklearn.calibration.CalibratedClassifierCV

svm = LinearSVC(); clf = CalibratedClassifierCV(svm); clf.fit(X_train, y_train); y_proba = clf.predict_proba(X_test);

Read up about it on this SO link : https://stackoverflow.com/questions/26478000/converting-linearsvcs-decision-function-to-probabilities-scikit-learn-python