fchollet / deep-learning-with-python-notebooks

Jupyter notebooks for the code samples of the book "Deep Learning with Python"
MIT License
18.59k stars 8.63k forks source link

Ch11 Understanding TF-IDF normalization #230

Open intelligencethink opened 11 months ago

intelligencethink commented 11 months ago

The explanation of tfidf shown at page326 as below.

def tfidf(term, document, dataset): term_freq = document.count(term) doc_freq = math.log(sum(doc.count(term) for doc in dataset) + 1) return term_freq / doc_freq

Is it right? According to the formula, the total number of documents in the dataset is not shown in doc_freq.