Open bwbaugh opened 11 years ago
Now, using conditional probabilities only (instead of trying to classify each feature as its own document):
<span style="color: #808080" title="neutral: 51.99%">('__start__', u'This')</span>
<span style="color: #808080" title="neutral: 52.04%">(u'This',)</span>
<span style="color: #808080" title="neutral: 54.68%">(u'This', u'is')</span>
<span style="color: #808080" title="neutral: 56.23%">(u'is',)</span>
<span style="color: #808080" title="neutral: 61.14%">(u'is', u'only')</span>
<span style="color: #808080" title="neutral: 56.40%">(u'only',)</span>
<span style="color: #c0ad00" title="negative: 54.75%">(u'only', u'a')</span>
<span style="color: #808080" title="neutral: 54.63%">(u'a',)</span>
<span style="color: #c06500" title="negative: 73.62%">(u'a', u'test')</span>
<span style="color: #808080" title="neutral: 52.74%">(u'test',)</span>
<span style="color: #808080" title="neutral: 65.38%">(u'test', '__end__')</span> <br>
Perhaps by the prior probabilities skew the overall classification so much that just a single feature isn't capable of overcoming the priors. Now that I think about it, why are we throwing away the confidence value from the classification process, and re-calculating it from the conditionals? Which is the correct approach?
When we use the original confidence value from the classification process, we get:
<span style="color: #808080" title="neutral: 50.56%">('__start__', u'This')</span>
<span style="color: #a5c000" title="positive: 56.90%">(u'This',)</span>
<span style="color: #808080" title="neutral: 50.56%">(u'This', u'is')</span>
<span style="color: #a5c000" title="positive: 56.90%">(u'is',)</span>
<span style="color: #808080" title="neutral: 50.56%">(u'is', u'only')</span>
<span style="color: #a5c000" title="positive: 56.90%">(u'only',)</span>
<span style="color: #808080" title="neutral: 50.56%">(u'only', u'a')</span>
<span style="color: #a5c000" title="positive: 56.90%">(u'a',)</span>
<span style="color: #808080" title="neutral: 50.56%">(u'a', u'test')</span>
<span style="color: #a5c000" title="positive: 56.90%">(u'test',)</span>
<span style="color: #808080" title="neutral: 50.56%">(u'test', '__end__')</span> <br>
Why are there only two unique confidence values across all features? Shouldn't the individual conditional probabilities cause at least some variation?
Part of the web interface is supposed to show how each feature would be classified if it was a document of length one. Why does the hierarchical sentiment classifier only label these individual features as either
neutral
orpositive
, even when the confidence value is less than 0.5?As an example:
Current hash: 5fd9baa3551fc1c0af4692cbae7a589ff1ea21e4