Nitro / scalda

Topic Modeling with LDA in Scala and Spark
MIT License
31 stars 11 forks source link

Unable to do topicProportions on loaded LDA model. #6

Open r-jenish opened 8 years ago

r-jenish commented 8 years ago

Saving the model as:

val lda = LocalOnlineLda(
    OnlineLdaParams(
        vocabulary = lines(vocabFile).toIndexedSeq,
        alpha = 1.0/numTopics,
        eta = 1.0/numTopics,
        decay = 128,
        learningRate = 0.7,
        maxIter = 1000,
        convergenceThreshold = 0.001,
        numTopics = numTopics,
        totalDocs = numDocs,
        perplexity = true
    )
)
val model = lda.inference(new TextFileIterator(corpusDir,mbSize))
lda.saveModel(model,new File ("/home/xyz/tmp/lda_model"))

and loading it as:

val lda = LocalOnlineLda.empty
val model = lda.loadModel(new File("/home/xyz/tmp/lda_model")).get
val docloc = new File("/home/xyz/tmp/test_dataset/33629")
val testdoc = text(docloc)
val topicprops = lda.topicProportions(testdoc, model, Some(com.nitro.scalda.tokenizer.StanfordLemmatizer()))

Gives Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Dimension mismatch! error in the line val topicprops = lda.topicProportions(....).

Error Log:

Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Dimension mismatch!
    at scala.Predef$.require(Predef.scala:224)
    at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:53)
    at breeze.linalg.operators.DenseMatrixMultiplyStuff$implOpMulMatrix_DMD_DMD_eq_DMD$.apply(DenseMatrixOps.scala:48)
    at breeze.linalg.ImmutableNumericOps$class.$times(NumericOps.scala:135)
    at breeze.linalg.DenseMatrix.$times(DenseMatrix.scala:53)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda$$anonfun$eStep$2.apply(LocalOnlineLDA.scala:83)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda$$anonfun$eStep$2.apply(LocalOnlineLDA.scala:74)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda.eStep(LocalOnlineLDA.scala:74)
    at com.nitro.scalda.models.onlineLDA.local.LocalOnlineLda.topicProportions(LocalOnlineLDA.scala:270)
    at testmodel$.delayedEndpoint$testmodel$1(testmodel.scala:28)
    at testmodel$delayedInit$body.apply(testmodel.scala:7)
    at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
    at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.App$$anonfun$main$1.apply(App.scala:76)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
    at scala.App$class.main(App.scala:76)
    at testmodel$.main(testmodel.scala:7)
    at testmodel.main(testmodel.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

Scala Version: 2.11.8 System: Ubuntu 14.04

onema commented 5 years ago

I'm experiencing the same issue.

It seems the problem occurs because the example uses the empty LocalOnLineLda.empty method to create the lda object.

https://github.com/Nitro/scalda/blob/74c585af40db6e4426a3903aa50f9dd743c0b2ec/src/main/scala/com/nitro/scalda/examples/TopicProportionsExample.scala#L25

When using empty, the numTopics is set to zero (0). Later, this value is used to create the initialGamma which returns an empty matrix (1 row, 0 columns).

https://github.com/Nitro/scalda/blob/74c585af40db6e4426a3903aa50f9dd743c0b2ec/src/main/scala/com/nitro/scalda/models/onlineLDA/local/LocalOnlineLDA.scala#L264-L268

For what I can tell this matrix is used in several operations and eventually fails as it is expected to have the same number of columns as the original model had, and since it was set to zero, it fails 😞.

A possible solution is to use the model.lambda.rows instead of the params.numTopics in the topicProportions method as this matches the original number of topics.

    // LocalOnlineLDA.topicProportions

    val initialGamma = new DenseMatrix[Double](
      1,
      model.lambda.rows,
      G(100.0, 1.0 / 100.0).sample(model.lambda.rows).toArray
    )

Maybe the Nitro team can tell us if this is the correct way to address the problem?

Thanks!