AmazaspShumik / sklearn-bayes

Python package for Bayesian Machine Learning with scikit-learn API
MIT License
513 stars 118 forks source link

predict_proba generating predictions > number of classes #24

Open Blair-Young opened 7 years ago

Blair-Young commented 7 years ago

I'm training my classifier using clf = RVC(kernel = 'rbf') clf.fit(embeddings, labelsNum) were the number of labels = 10

When I inspect the clf I get this: with open('RVC.pkl', 'r') as rvc: le_rvc, clf_rvc = pickle.load(rvc)

array(['Ariel_Sharon', 'Colin_Powell', 'Donald_Rumsfeld', 'George_W_Bush',
       'Gerhard_Schroeder', 'Hugo_Chavez', 'Jean_Chretien',
       'John_Ashcroft', 'Junichiro_Koizumi', 'Tony_Blair'], 
      dtype='|S17')

Which is correct, 10 classes.

However, when I try to predict my test set by running this

predictions = clf.predict_proba(rep).ravel()
                maxI = np.argmax(predictions)
                person = le.inverse_transform(maxI)
                confidence = predictions[maxI]

the length of predictions is 20

Meaning that when le.inverse_transform(maxI) is called it fails if maxl is >10

I must be doing something wrong on my side, but is there a reason why the clf is predicting more values than needed?

Blair-Young commented 7 years ago

Hey, just wondering how this bug fix is going?