Regex counter counts only appearances in tokens, which ignore multi-word appearances.
Description of changes
As discussed with @ajdapretnar, I added a dropdown beside each statistic so that the user can decide whether to do computation on tokens/ngrams or a full document. Currently, it includes two options:
Preprocessed tokens - Statistics are computed on either tokes or ngrams, depending on what is more suitable for the statistic.
Documents - statistic computed on full document text
Discussion
~Is Preprocessed tokens a good term, or do we have any other idea?~ Changed to Tokens
~Average word length is currently implemented only on documents since the name doesn't make sense on N-grams. Should we rename it to Average term length and apply it to documents and n-grams? So that it is word length on documents and ngram length on ngrams.~ Renamed to Average term length and enabled for ngrams.
Issue
Regex counter counts only appearances in tokens, which ignore multi-word appearances.
Description of changes
As discussed with @ajdapretnar, I added a dropdown beside each statistic so that the user can decide whether to do computation on tokens/ngrams or a full document. Currently, it includes two options:
Discussion
Preprocessed tokens
a good term, or do we have any other idea?~ Changed toTokens
Average word length
is currently implemented only on documents since the name doesn't make sense on N-grams. Should we rename it toAverage term length
and apply it to documents and n-grams? So that it is word length on documents and ngram length on ngrams.~ Renamed toAverage term length
and enabled for ngrams.Includes