product of probabilities as opposed to sum of logs

GoogleCodeExporter commented 8 years ago

Hi, 
First, I have to say that this package is really great and very well
written. However, I have one main issue with it that made me stop using it:

The algorithms use raw probabilities [0-1] as opposed to log probabilities
so "Product" is used instead of "Sum of logs". When I use this package with
a lot of features and a lot of hidden states the algorithms crash.. I get
NaN values because it divides extremely small numbers by extremely small
numbers and sometimes it divides 0 by 0.

The package works well if you have a small data set with small states.

It would be very much appreciated and very helpful if this package is
converted to log probablities everywhere (A matrix, pdf estimation, 
decoding algorithms, etc)

One more thing, will this package support multivariant Gaussian mixtures?

Thanks so much!
F.

Original issue reported on code.google.com by fadi.bia...@gmail.com on 19 Mar 2009 at 9:41

GoogleCodeExporter commented 8 years ago

Hi!

I also like this package, it's versatile and almost everything is available for 
my
context of HMMs.

As you have, I also ran into the problem of division by zero's, since I use a 17
dimensional input vector with values ranging from 0 to 255. I've posted about 
it in
the discussion group as well. For now I remedied these division by zeroes by 
adding
an if statement around them. This checks if the divisor is 0.0. If so the 
calculation
is skipped or just the outcome of 0.0 is added.
I'm not that knowledgeable about HMMs and their algorithms, so my question is 
if this
severely impacts the learning performance of a HMM, because after adding these 
if
statements I never get NaN values, but (seemingly?) correct double values.

What are your thoughts about this solution? Could you maybe elaborate on your
solution to use log probabilities instead of raw probabilities? If this 
remedies the
division by zero and division of extremely small numbers correctly (and my 
'simple'
solution does not), I can perhaps convert these calculations per your 
instructions.

Thank you for replying!

P.S. JaHMM already supports multivariate Gaussian mixtures by the CLI option
'multi_gaussian' check the manual or jahmm-cli -help option.

Original comment by m.s.ganz...@student.utwente.nl on 25 Mar 2009 at 3:04

GoogleCodeExporter commented 8 years ago

I don't think what you did is the right thing. Your model parameters will not be
correctly estimated so it will not fit the data properly. You might get an A 
matrix
that does not sum to one or even zeros everywhere(depends where your crash 
occurred)
 which makes your model useless.

The solution I'm suggesting is a standard solution used in HMM and many other 
machine
learning packages. Instead of using raw probabilities in the model parameters, 
we
should use log probabilities. So in the training/decoding algorithm we can use 
sum
instead of product which causes these extreme low numbers -> 0.
There are so many places to change and I was worried if I do that, although I 
know
what to do, I will forget something and then I will report wrong numbers in my
research so I decided to give up and use a different package.

It's sad because this package is really very well written and easy to use.

-F.

Original comment by fadi.bia...@gmail.com on 25 Mar 2009 at 3:25

GoogleCodeExporter commented 8 years ago

Thank you for your comments! I appreciate your quick response. It could very 
well be
that my model parameters will not be completely correctly estimated. Then again 
these
errors are in the BaumWelchScaledLearner algorithm, specifically the scaling 
factors
(which make it a 'ScaledLearner' I assume) are sometimes 0.0. This perhaps 
means that
no scaling is required for the forward and backward values. Now I'm not really 
into
all details of the BaumWelchScaledLearner or even the BaumWelchLearner 
algorithm for
that matter, but the HMMs which are generated by JaHMM do seem to be sound. No 
0.0 or
funny values in matrices whatsoever.
Then again my current HMM for recognizing a fist from the cyberglove output has 
only
got two states, either fist or no fist. It does recognize it surprisingly good,
according to my very preliminary results. And I've just used 11 observation 
sequences
for the kmeans learner and another 11 for improving the HMM with the Baum-Welch
algorithm.
However, I'm curious to see how the HMM (and JaHMM specifically) performs when 
I use
more observation sequences to improve it and ofcourse scaling it up to 
recognize more
then one form of the hand. It could very well be that the performance will then 
drop.

For now I don't really see a reason to change the package I use, but who knows 
maybe
in the future I will recognize JaHMM's drawbacks. However, I'm definitely 
interested
in the package you're using. Is it in Java? Or is it HTK perhaps? What are your 
findings?

Regards!

MaGaM

Original comment by m.s.ganz...@student.utwente.nl on 26 Mar 2009 at 11:20

ZapucAlexandra / jahmm

product of probabilities as opposed to sum of logs #1