I think there is an error inside the class unique ngram, the right computation of ngram should be #unique_grams/#grams
def get_ng(self):
document = self.get_reference()
length = len(document) #is this a bug? to get ngramm is needed to divide uniquengram by all ngram, not len of sentence!
grams = list()
for sentence in document:
grams += self.get_gram(sentence)
print(grams,len(set(grams)),len(grams))
#to get ngrams is divide by number of grams not by number of sentence
return len(set(grams))/length` #The right computation should use len(grams) instead of length
I think there is an error inside the class unique ngram, the right computation of ngram should be #unique_grams/#grams