JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

issue with corpus.get_term_freq_df() #53

Closed vaidyan5 closed 4 years ago

vaidyan5 commented 4 years ago

Your Environment

vaidyan5 commented 4 years ago

df_test.csv.zip corpus = st.CorpusFromPandas(df_test,category_col='loc_k',text_col='message',nlp=nlp).build().remove_terms(nlp.Defaults.stop_words, ignore_absences=True)

term_freq_df = corpus.get_term_freq_df()

Produces error:' TypeError: unsupported operand type(s) for +: 'int' and 'str'

JasonKessler commented 4 years ago

What is the full trace of the error?

vaidyan5 commented 4 years ago

TypeError Traceback (most recent call last)

in ----> 1 term_freq_df = corpus.get_term_freq_df() 2 ~/anaconda3/lib/python3.7/site-packages/scattertext/TermDocMatrix.py in get_term_freq_df(self, label_append) 105 return pd.DataFrame(mat, 106 index=pd.Series(self.get_terms(), name='term'), --> 107 columns=[c + label_append for c in self.get_categories()]) 108 109 def get_term_freq_mat(self): ~/anaconda3/lib/python3.7/site-packages/scattertext/TermDocMatrix.py in (.0) 105 return pd.DataFrame(mat, 106 index=pd.Series(self.get_terms(), name='term'), --> 107 columns=[c + label_append for c in self.get_categories()]) 108 109 def get_term_freq_mat(self): TypeError: unsupported operand type(s) for +: 'int' and 'str'
JasonKessler commented 4 years ago

These types of errors occur when non-string category names are used. Version 0.0.2.61 fixes this issue.

>>> corpus = st.CorpusFromPandas(df_test,category_col='loc_k',text_col='message',nlp=nlp).build().remove_terms(nlp.Defaults.stop_words, ignore_absences=True)
>>> corpus.get_term_freq_df()
                 458 freq  530 freq  484 freq  515 freq  507 freq  506 freq  467 freq
term
anybody                 1         0         0         0         0         0         0
interested              2         0         0         0         0         0         0
trading                 1         0         0         0         0         0         0