cpsievert / LDAvis

R package for web-based interactive topic model visualization.
Other
557 stars 131 forks source link

Definition of doc.length and term.frequency #52

Closed Graybosch closed 8 years ago

Graybosch commented 8 years ago

Dear Carson,

Thank you for the excellent topic modelling visualization tool!

In the doc.length object is the number of tokens that appear in a document the number of unique tokens or the number of tokens? For instance if my document is: {Larry, Larry, Larry} would the corresponding factor in doc.length be 3, or 1? I am guessing "1"

In the term.frequency object is the number of times a term appears the sum over documents of the number of unique appearances of the term in each document, i.e. a sum of numbers that, for each document, are either 1 or 0, or is it simply the total number of times the term appears in any document?

Thank you again

cpsievert commented 8 years ago

For instance if my document is: {Larry, Larry, Larry} would the corresponding factor in doc.length be 3, or 1?

3

is it simply the total number of times the term appears in any document?

Yes

FYI, we provide a definition for all the input terms in the vignette: browseVignettes("LDAvis")