Open alexs131 opened 2 years ago
Hi @alexs131 Thank you for reporting the bug. It seems to be a problem with floating point precision errors.
Currently, the numerator(doc.numByTopic
) and denominator(doc.getSumWordWeight()
) of topic distribution are stored separately, and it seems that errors in these values accumulate during the training process, especially on smaller dataset.
I'll investigate this problem more.
Hello, I have encountered an issue where the sum of the topic-word distribution also does not sum to 1. I am running version 0.12.1, with hyperparameters
tw=TermWeight.PMI,
gamma=1,
alpha=0.1,
eta=0.001,
initial_k=20,
seed=1.
I have run the HDP model previously on a different, larger dataset, and did not encounter this issue.Thanks for any help here and apologies if this is a misunderstanding on my part.