TommyJones / tidylda

Implements an algorithim for Latent Dirichlet Allocation using style conventions from the [tidyverse](https://style.tidyverse.org/) and [tidymodels](https://tidymodels.github.io/model-implementation-principles/index.html).
Other
41 stars 3 forks source link

Properly handle rng for parallel sampling #20

Closed TommyJones closed 3 years ago

TommyJones commented 4 years ago

This affects both create_lexicon and fit_lda_c

see https://www.pcg-random.org/posts/critiquing-pcg-streams.html for an example

TommyJones commented 4 years ago

Need to verify two things:

  1. sampler does not touch R API while running b/c R API is single threaded
  2. sampler respects random seeds set in the R environment.
TommyJones commented 4 years ago

Some examples: https://stackoverflow.com/questions/54142833/these-samplers-cannot-be-used-in-parallelized-code

Another example: https://twitter.com/coolbutuseless/status/1280820794004832264?s=12

Discussion of different ways to seed: https://rstudio-pubs-static.s3.amazonaws.com/225931_1d8d1b05d56b4caabc317135e6798bc6.html

TommyJones commented 4 years ago

Note you also have to respect R's set.seed

Some notes here: https://gallery.rcpp.org/articles/random-number-generation/

Also an issue with my approach to #40

TommyJones commented 3 years ago

Now moot due to #48