Closed jiunsiew closed 6 years ago
Thanks for reporting. There are 2 issues:
transform
we add topic prior and in fit_transform
we don't. This is a bug and I will fix it.@jiunsiew please try development version
@dselivanov looks good now. Really appreciate the quick fix. Great stuff!
Hi there,
Firstly thanks for your efforts on creating and maintaining such a great package. I've been using it to do some topic modelling.
The 'problem' I'm finding a bit hard to understand is that when using the
fit_transform
andtransform
methods inLDA
, I get different topic distributions even though the data is the same.I'm not overly surprised that the results are slightly different due to the generative process of the algorithm but was quite surprised at how the different they were. In particular, when using
transform
, it looks like topics which had a probability of 0 before, are now non-zero in a non-trivial sense. As such, other topics that were previously non-zero have lower probabilities and the issue is exacerbated when probabilities of topics are rather close.I've tried forcing the number of iterations higher in the transform and setting the seed before calling
transform
but it doesn't seem to make much difference.Is this behaviour to be expected or am I missing something? Would have thought that fitting the same data to the same model would provide very similar results.
I've attached some of the code below that recreates what I'm seeing. Thanks.
Here are the results of the last two lines: