Implements an algorithim for Latent Dirichlet Allocation using style conventions from the [tidyverse](https://style.tidyverse.org/) and [tidymodels](https://tidymodels.github.io/model-implementation-principles/index.html).
Parallel approximate Gibbs as implemented in 5549ce1 presentes 3 problems:
Parallel sampling - current implementation hits the R API from multiple threads which is unstable and a deal killer for CRAN. Most possible fixes I can think of further increase code complexity, are problematic for respecting R's set.seed(), and increase the number of dependencies.
Model quality - I see a big drop off in R-squared and coherence (though visual inspection of top words in topics seems ok) when fitting models with parallel Gibbs
Single-threaded speed - On my (very powerful) Ubuntu 20.04 machine, the single threaded sampler is slower than on my Macbook. Before this change, it was blazingly fast, at least for single threaded models.
This package has been nearly ready for a year and still isn't on CRAN. My goal is to revert and then get it on CRAN with a message that the API is still unstable. I will then (well in parallel, no pun intended) work on the Rust implementation of WarpLDA with the unique features I've added to this Gibbs sampler. A future version will either only use the Rust implementation or offer the chance to change the engine to the WarpLDA sampler.
Note that this will effectively nullify the need to address #20 and possibly #41
Parallel approximate Gibbs as implemented in 5549ce1 presentes 3 problems:
set.seed()
, and increase the number of dependencies.This package has been nearly ready for a year and still isn't on CRAN. My goal is to revert and then get it on CRAN with a message that the API is still unstable. I will then (well in parallel, no pun intended) work on the Rust implementation of WarpLDA with the unique features I've added to this Gibbs sampler. A future version will either only use the Rust implementation or offer the chance to change the engine to the WarpLDA sampler.
Note that this will effectively nullify the need to address #20 and possibly #41