Closed TommyJones closed 4 years ago
When we get to the main Gibbs iteration for loop, we'd have an "if" statement that runs a single threaded Gibbs sampler or multi-threaded approximate Gibbs sampler, depending on how many CPUs the user chooses.
The multi-threaded sampler would also need to divide documents and Cd into batches and provide a batch copy of Cv and Ck. We'd parallelize over batches instead of documents. Then, within each batch, call the main sampling function called out in #10 passing the local objects. After each loop, we update Cv and Ck globally and copy those back to the local batches. (And we'd calc likelihood/optimize alpha/aggregate counts after burnin if the options are set.)
Another idea, possibly not a good one:
run parallel without creating batches or special objects. (Still need to transpose Cd for column major or make it a list.) Instead of updating a vector of topic probabilities (qz
), initialize one on the fly for each thread and calculate the probabilities based on current values of Cd, Cv, and Ck.
Three problems I see with this:
The best way to handle (1) and (3) is to communicate to the user explicitly that if they need determinism to only use the single threaded version.
Best I've got for (2) is pray it doesn't happen. I could put in an "if" to say don't decrement if it's zero, but that could make the counts get off in a way that is impossible to fix. I could also put in a pause, but (a) there's no guarantee an entry would be non-zero after waiting and (b) it might never terminate if it reaches zero and just stays there. i.e. the program will keep waiting and never stop running
The idea above failed about as hard as it could have. First run crashed my R session. Subsequent runs pretty much always threw a negative probabilities errors, probably due to collisions where the same slot was subtracted more than once, resulting in Cv(k,v)
going negative.
Trying a different approach...
@TommyJones, don't forget to verify that setting a random seed still works even with multithreading.
Gibbs is parallel as of 5549ce180054729f552e32725fe2d009ebe18b93
I'm going to open a new issue about verifying the sampler for multithreaded processes
Make it run kind of like Mallet using Newman et al.'s algo
Also, keep the sequential algo in case someone sets to one core