bnosac / textplot

Text Plots
GNU General Public License v2.0
54 stars 8 forks source link

Biterm most frequent topic filtering #7

Closed BenoitFayolle closed 2 years ago

BenoitFayolle commented 2 years ago

https://github.com/bnosac/textplot/blob/d0c40fb84738c0588a4e08784d30eabea26dd52a/R/textplot_biterms.R#L230-L231 Correct me if I'm wrong but these don't actually pick the best/most frequent topic. topic_freq gives the number of occurrences of each biterm in the whole corpus since topic is not included in the by argument of the first line. Hence second line picks the maximum of a variable that is constant within each group

jwijffels commented 2 years ago

I tried to speed up the logic which I orignally implemented at https://github.com/bnosac/textplot/blob/d0c40fb84738c0588a4e08784d30eabea26dd52a/R/textplot_biterms.R#L232-L233 but the implementation wasn't the correct speedup. Later on, I make sure only biterms with terms highly emitted by each topic are shown at https://github.com/bnosac/textplot/blob/d0c40fb84738c0588a4e08784d30eabea26dd52a/R/textplot_biterms.R#L246 This was done to make the graph crisp. So a bug clearly but probably not occurring that much unless you really have completely overlapping topics.

BenoitFayolle commented 2 years ago

I think you are responding to my other issue but this one is different. I can send a reprex tomorrow

BenoitFayolle commented 2 years ago

Nevermind, I just saw your commit to fix this issue 👍

jwijffels commented 2 years ago

I pushed the package on CRAN just now.