agelencs / moa

Automatically exported from code.google.com/p/moa
0 stars 0 forks source link

Training NaiveBayes on stream from MongoDB #13

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Connect to "Iris" collection in Mongo using Java API. Get DBCursor object 
that points to whole learning set. 

2.Initialize dataSet with "schema" from first Mongo document from the 
collection. Treat every Attribute as nominal (use DBCursor.distinct()) or 
string type.

3. Declare NaiveBayes and initialize it by prepareForUse().

4. Convert BSONs to Instance using BSON.toMap() and SparseInstance(double 
weight, double[] atrValues) constructor with given attributes values that match 
their value index in dataSet. Use setDataset() to give reference for Attributes.

5. Train NaiveBayes by NaiveBayes.trainOnInstanceImp() or 
NaiveBayes.trainOnInstance().

6. Print out NaiveBayes.getVotesForInstance(inst).

What is the expected output? What do you see instead?
There should be probabilities for 3 Iris classes that appears non-Random and 
greater than numeric 0. 

Instead I get values that differ although are 'very small' (i.e. E-15) for 
every class value, Basic(...)Evalutator gives values of 0 for correctly 
classified and the same for incorrectly classified.  

What version of the product are you using? On what operating system?
Linux Ubuntu 12.04.5 / MOA 2014.04a

Original issue reported on code.google.com by p.s.gliniecki on 2 Sep 2014 at 1:54

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
PROBLEM SOLVED

Original comment by p.s.gliniecki on 2 Sep 2014 at 2:37

GoogleCodeExporter commented 8 years ago
How did you solve the problem? Can we close the issue?

Original comment by abi...@gmail.com on 3 Sep 2014 at 2:12

GoogleCodeExporter commented 8 years ago
It wasn't a problem really, I got very low scores for classes and I thought it 
should be probability - I know now it's not - although taking the largest 
resulted in accuracy ~0.85 on Iris. 

Sorry for not explaining it in the first comment. 

Original comment by p.s.gliniecki on 3 Sep 2014 at 10:25

GoogleCodeExporter commented 8 years ago
Thanks! I close it then.

Original comment by abi...@gmail.com on 3 Sep 2014 at 10:54