TommyJones / tidylda

Implements an algorithim for Latent Dirichlet Allocation using style conventions from the [tidyverse](https://style.tidyverse.org/) and [tidymodels](https://tidymodels.github.io/model-implementation-principles/index.html).
Other
41 stars 3 forks source link

Implement additional uses for refit.tidylda() #62

Closed TommyJones closed 2 years ago

TommyJones commented 2 years ago

For on-line learning, enable new_data to be a single row.

For continued training of an existing model, allow new_data to be NULL

TommyJones commented 2 years ago

It turns out that on-line learning worked already. Just had to make a tweak to summarize_topics for a cleaner UX.

For continued training, this is a "wontfix" issue. The tidylda object only retains Cd and Cv, but not Zv. Zv is necessary to continue training. So, one needs the original DTM to continue training, just pass it as new_data. In other words, this use case is still enabled if you have the original training data set. Nothing for me to do.

Will submit a PR and close shortly.

TommyJones commented 2 years ago

https://github.com/TommyJones/tidylda/commit/2db222f0e908fc1509c16c9cd068c75319d0bc67