Closed avkoehl closed 4 years ago
Next step will be creating the topic model tables in sql when we have decided on a model.
Don't have permissions to add to the directory
Don't have permissions to add to the directory
Should be good now
Use malletparse or RMallet to read the results of the topic model. Then create the ldavisjson required (try to not rearrange topics by size). Add to the directory hosted at: https://datalab.ucdavis.edu/text-reports/archive_text_reports/quintessence/lda/
Added ldavis for 75 and 90 topics
@sampizelo What do we think? 90 or 75 topics?
@avkoehl I think 75 for now. It wasn't quite as "optimal" as 90, but it still was a clear breakpoint, and is going to be much less visually cluttered and I think will make a lot more sense for people who are newer to topic models.
@avkoehl And looking at the LDAvis now... is it possible to just not show topic 59 at all, and make it a 74-topic model? I'm not sure what difficulties that would cause, but it's all in Latin, so we don't really want to see it anyway, and it's hugely skewing our plot. (Also FWIW - they aren't just Latin words in general, but Latin stop words specifically - est, qui, quod, cum, hoc, etc. We would still be leaving some other topics with useful Latin words in them).
I can also rerun the clustering optimizer on topic terms without #59 and see if that changes anything.
For future reference (not sure where to put this) - looks like snowball supports Latin stopwords as well: https://www.rdocumentation.org/packages/stopwords/versions/1.0. We should consider incorporating Latin, French, Irish, Scottish, and German stopword filters into our workflow at some point in the future (not a priority).
Motivation
We want to be able to look at the outputs of the two models we just ran.
Task
Use malletparse or RMallet to read the results of the topic model. Then create the ldavisjson required (try to not rearrange topics by size). Add to the directory hosted at: https://datalab.ucdavis.edu/text-reports/archive_text_reports/quintessence/lda/