NaturalNode / natural

general natural language facilities for node
MIT License
10.57k stars 861 forks source link

How to interpret the natural.BayesClassifier .value ? #362

Open s-a opened 7 years ago

s-a commented 7 years ago

I trained a natural.BayesClassifier with a few words to classify text and return some culture information. classified result of the word sauerkraut returns the following result.

Classification Culture Region VALUE
DE German(Deutsch) Germany (Deutschland) 0.00001437680159294961
GB Welsh (United Kingdom)(Cymraeg (Y Deyrnas Unedi… Cyprus (Κύπρος) 0.000007188400796474818
LC English (Saint Lucia)(English (Saint Lucia)) 0.000007188400796474814
TK Turkmen(Türkmen dili) Tokelau (Tokelau) 0.000007188400796474814
AS Assamese American Samoa (American Sa… 0.000007188400796474814
CW Dutch (Curaçao)(Nederlands (Curaçao)) Netherlands (Nederlân) 0.000007188400796474814

While the first result is a good hit. The following hits have not realy something to do with the classified word. Can someone point me to docs regarding classified item' s value? I' d like to learn more about the probabilitiy of a classified value score, because not each request returns such a clear result.

zawadzkip commented 6 years ago

Would be interested in see thing for Logistical Regression as well.

As an aside, are there any good examples for properly training/evaluating these models?

igilham commented 6 years ago

I found the same issue when testing the Bayes classifier for simple inputs.

I haven't tried the example in the OP, but I noticed that switching to the Logistical Regression classifier gave values more like confidence percentages. I.e. 0.98 for good matches and 0.0013 for very bad matches.

paulociencia commented 4 years ago

Does anybody have some updates for the question above or any documentation I would appreciate that. I'm facing the same issue getting very low scores and getting good ones other libraries. Thank you.