Open GoogleCodeExporter opened 8 years ago
i GUESS this is an problem close to the one i have at the moment. maybe your
values
end up in a very low variance which is almost 0. Due to rounding
inaccuracy(because
the number is so low: 0,00000.....1) it is interpreted as zero and thorws the
exception.
in my case I'm trying to remove the if-case which trows the exception. I can
not tell
for your case for sure (i even can't for my case yet... i still have to check
the
results for correctness). But I couldn't find any other way, since collected
(vectors
in my case) data doesn't give information about the resulting variances
directly. In
my case I would never know at the beginning how my data is combined and in which
matrixes they end up. Covariance shouldn't be <= 0 anyways in my case...
(remember:
im working with Vectors of Reals!)
maybe you can give it a try with your usage and tell your results...
kind regards,
Ben
Original comment by vamos.be...@gmail.com
on 27 May 2010 at 10:19
Me too I have the same problem with "Variance must be positive". In the case of
the K-Means clustering algorithm I found the problem arises when the clustering
produces a cluster with a single element (obviously the variance of a single
element is zero). A solution to this problem may try to avoid the clustering
algorithm to produce such a cluster. I'm trying to adjust the source code in
this way.
But I found the same problem with Baum-Welch algorithm too, as the following
exceptions trace says:
Exception in thread "main" java.lang.IllegalArgumentException: Variance must be
positive
at
be.ac.ulg.montefiore.run.distributions.GaussianDistribution.<init>(GaussianDistr
ibution.java:59)
at be.ac.ulg.montefiore.run.jahmm.OpdfGaussian.fit(OpdfGaussian.java:139)
at
be.ac.ulg.montefiore.run.jahmm.learn.BaumWelchLearner.iterate(BaumWelchLearner.j
ava:139)
at
be.ac.ulg.montefiore.run.jahmm.learn.BaumWelchLearner.learn(BaumWelchLearner.jav
a:172)
I haven't yet investigated about the way this Baum-Welch problem arise, but I'm
going to do so because I'd need this library for some experiment. Maybe I'll
post a reply if I find a solution.
Have you solved the problem in some manner, in the meantime?
Kind regards,
Simone.
Original comment by simbo1...@gmail.com
on 21 Sep 2010 at 3:31
I solved the problem for K-Means clustering. I modified the clustering
algorithm so that it cannot create a cluster with a single element. I also had
to modify the initial control for the number of elements and number of
clusters, now there must be at least 2*k elements (so that each cluster can
contain at least 2 elements). For now it's an additional control after the
normal clustering: if the control finds a cluster with only one element, it
performs a redistribution taking an element from a near cluster. If someone is
interested in this modification, he can contact me.
Simone.
Original comment by simbo1...@gmail.com
on 22 Sep 2010 at 6:05
I think your way to avoid this problem is partly pretty good. The thing is:
(Depending on what you observe) it can more or less likely happen that the one
cluster is filled with only same elements (maybe because there is one
observation appearing again and again). Now you still have no variance, also
when there are more than one element in a cluster. Also when it is unlikely
that one cluster is filled with similar elements in your problem it can happen
some time and the algorithm crashes.
another problem is: for example you are learning clusters with a VERY low
variance. a bit later you find a new observation and want to learn it into your
existing hmm. because of your low variance in your clusters (and also due to
rounding problems of your PC when a low distance between observation and a
cluster drifts to zero) the algorithm figures out that your new observation
doesn't fit any cluster. now you end up in the next problem. learning doesn't
work and aborts. (i thinnk it ends up in NAN because of a division by zero)
there are some problems following each other or that are related to this
problem. it is some kind of physical problem which the kmeans-algorithm simply
can't deal with. i think there are algorithms which could deal with, but they
are not implemented.
those problems appear more when similar observations appear regularly, which
can happen in most systems.
Original comment by vamos.be...@gmail.com
on 1 Oct 2010 at 5:25
Yes, my solution is not valid in all cases.
I adjusted it for my pourposes: my observations were real numbers from "real
world" observations, so I knew it was very unlikely to obtain the same value 2
or more times (actually I didn't find such observations, while I often found
many single-observation clusters).
For the "no fit" problem, I'm not sure I have understood you: the clustering
algorithm takes all the data and tries to fit them in clusters and then, based
on the clustering, produces an HMM. The clustering algorithm search for the
nearest cluster when it process a new point, so it shouldn't matter about
variance.
I apologize for my english.
Simone.
Original comment by simbo1...@gmail.com
on 1 Oct 2010 at 6:04
no, the clustering algorithm doesn't. i was looking a bit ahead: if you try to
learn a new sequence with the baum-welch algorithm into your existing
kmeans-learnt hmm and your states have very little variance it happens at this
point that the new observation can not be learnt since it ends up in an
NaN-Error because of the little distance that ends up in zero (because of the
little variance...). so it seems like this new observation absolutely doesn't
fit your hmm.
is it coherent?
Original comment by vamos.be...@gmail.com
on 1 Oct 2010 at 9:35
If someone is interested in my solution of the "Variance must be positive"
problem when it belongs to the K-means clustering algorithm (when it creates
single-element clusters), I attach here my modified KMeansCalculator class.
Simone.
Original comment by simbo1...@gmail.com
on 16 Jan 2011 at 10:04
Attachments:
Original issue reported on code.google.com by
naughtyn...@gmail.com
on 13 Jan 2010 at 5:52