geek-ai / Texygen

A text generation benchmarking platform
MIT License
863 stars 203 forks source link

Unique ngram, division by lenght of document #51

Open alessandropec opened 3 years ago

alessandropec commented 3 years ago

I think there is an error inside the class unique ngram, the right computation of ngram should be #unique_grams/#grams

def get_ng(self):
        document = self.get_reference()
        length = len(document) #is this a bug? to get ngramm is needed to divide uniquengram by all ngram, not len of sentence!
        grams = list()
        for sentence in document:

        grams += self.get_gram(sentence)
        print(grams,len(set(grams)),len(grams))

        #to get ngrams is divide by number of grams not by number of sentence
        return len(set(grams))/length` #The right computation should use len(grams) instead of length