chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.22k stars 250 forks source link

Typo in apply_idf_weighting causing error #190

Closed idc9 closed 6 years ago

idc9 commented 6 years ago

There is a typo in textacy.vsm.matrix_utils.apply_idf_weighting() line 193 which causes the function to break.

Problem and solution

When apply_idf_weighting calls get_inverse_doc_freqs is uses the key word argument idf_type when it should use the key word argument type_ (line 193).

In other words, this line currently reads

idfs = get_inverse_doc_freqs(doc_term_matrix, idf_type=idf_type)

but it should read

idfs = get_inverse_doc_freqs(doc_term_matrix, type_=idf_type)

Steps to Reproduce (for bugs)

The following code

from scipy.sparse import rand
from textacy.vsm.matrix_utils import apply_idf_weighting

M = rand(m=100, n=100)
apply_idf_weighting(M)

produces the following error

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-4b99cfa2e2c2> in <module>()
      3 
      4 M = rand(m=100, n=100)
----> 5 apply_idf_weighting(M)

/Users/iaincarmichael/anaconda/envs/py36/lib/python3.6/site-packages/textacy/vsm/matrix_utils.py in apply_idf_weighting(doc_term_matrix, idf_type)
    191         where value (i, j) is the tfidf weight of term j in doc i
    192     """
--> 193     idfs = get_inverse_doc_freqs(doc_term_matrix, idf_type=idf_type)
    194     return doc_term_matrix.dot(sp.diags(idfs, 0))
    195 

TypeError: get_inverse_doc_freqs() got an unexpected keyword argument 'idf_type'

Your Environment

bdewilde commented 6 years ago

Well, thanks for finding all my bugs and the rough edges of textacy's functionality, and sorry it has to be you. I'll fix this ASAP.