bootphon / wordseg

A Python toolbox for text based word segmentation
https://docs.cognitive-ml.fr/wordseg
GNU General Public License v3.0
16 stars 7 forks source link

descriptive stats definitions: what is unique? #18

Closed alecristia closed 6 years ago

alecristia commented 6 years ago

Is it odd that uniques = 1 for phones; that nword_hapax = uniques syllables = uniques words = 83 in the sample below? Does unique mean the number of types that occur exactly once?

sample

{ "phones": { "tokens/word": 3.0965637233579817, "uniques": 1, "token/types": 177.975, "tokens": 7119, "tokens/syllable": 2.500526870389884, "tokens/utt": 11.175824175824175, "types": 40 }, "corpus": { "nword_hapax": 83, "nword_types": 301, "mattr": 0.7118829183049362, "nutts_single_word": 124, "entropy": 0.023474054557757, "nutts": 637, "nword_tokens": 2299 }, "syllables": { "tokens/word": 1.23836450630709, "uniques": 83, "token/types": 8.398230088495575, "tokens": 2847, "tokens/utt": 4.469387755102041, "types": 339 }, "words": { "tokens": 2299, "tokens/utt": 3.609105180533752, "uniques": 83, "token/types": 7.637873754152824, "types": 301 } }

mmmaat commented 6 years ago
alecristia commented 6 years ago
mmmaat commented 6 years ago

Ok Alex I'll modify the wordseg-stats with those specs and let you know!