cpsievert / LDAvis

R package for web-based interactive topic model visualization.
Other
557 stars 131 forks source link

KL-Divergence Implementation does not handle 0 probabilities #78

Closed carlosparadis closed 6 years ago

carlosparadis commented 7 years ago

When executing createJSON, the following error will be thrown:

Error in stats::cmdscale(dist.mat, k = 2) : NA values not allowed in 'd'

I traced it down to:

https://github.com/cpsievert/LDAvis/blob/51bb51e6f2dd26c9d495a76482018d94a9945ddc/R/createJSON.R#L298-L304

To reproduce the issue:

Reproducible dataset

x <- c(0.2,0.3,0.3)
y <- c(0.2,0.3,0.4) 
b <- c(0.2,0.3,0) 

Using LDAvis implementation shown at the start of this issue:

> jensenShannon(x=x,y=y)
[1] 0.003583677
> jensenShannon(x=x,y=b)
[1] NaN

The same test, using cosine function from lsa package:

> cosine(x=x,y=y)
          [,1]
[1,] 0.9897595
> cosine(x=x,y=b)
          [,1]
[1,] 0.7687061
rnkazman commented 6 years ago

This seems like an implementation detail, not a principled reason to use one or the other. Is that correct?

cpsievert commented 6 years ago

Done in c7234d71168b1e946a361bc00593bc5c4bf8e57e

caitsimop commented 1 year ago

Hi there, I'm still getting this error in v0.3.5. Is this the most up to date version?