chrinide / yooreeka

A library for data mining, machine learning, soft computing, and mathematical analysis
1 stars 0 forks source link

Questions about calculating P(a|C) in NaiveBayes.getProbability(Instance i, Concept c) #17

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

In NaiveBayes:

    public double getProbability(Instance i, Concept c) {

        double cP = 1;

        for (Attribute a : i.getAttributes()) {

            if (a != null && attributeList.contains(a.getName())) {

                Map<Attribute, AttributeValue> aMap = p.get(c);
                AttributeValue aV = aMap.get(a);
                if (aV == null) {
                    // the specific attribute value is not present for the
                    // current concept.
                    // Can you justify the following estimate?
                    // Can you think of a better choice?
                    cP *= ((double) 1 / (tSet.getSize() + 1));
                } else {
                    cP *= (aV.getCount() / conceptPriors.get(c));
                }
            }
        }

        return (cP == 1) ? (double) 1 / tSet.getNumberOfConcepts() : cP;
    }

Here we calculate the P(a|C) : 
P(a|C) = (aV.getCount() / conceptPriors.get(c)); conceptPriors.get(c) is all 
instances in concept C.

My quesition is should we do it like this: P(a|C) = aV.getCount() / (all words 
count in C) ??

Original issue reported on code.google.com by uxg...@gmail.com on 26 Dec 2013 at 9:37

GoogleCodeExporter commented 9 years ago
Well, first of all, you need to define what you mean by "word".

If I understand what you are asking, the answer is no. 

Original comment by ba...@marmanis.com on 5 Jan 2014 at 11:56

GoogleCodeExporter commented 9 years ago

Original comment by ba...@marmanis.com on 11 Jan 2014 at 9:01