JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.24k stars 290 forks source link

Scattertext has an issue with my code! #3

Closed shyamalschandra closed 8 years ago

shyamalschandra commented 8 years ago

I just took two text datasets and fed them into the boiler-plate code that was shown in the jupyter notebook example but I am getting the following error:

Traceback (most recent call last):
  File "stextt.py", line 1, in <module>
    import scattertext as ST
  File "/Users/shyamalsuhanachandra/Desktop/scattertext.py", line 12, in <module>
AttributeError: 'module' object has no attribute 'TermDocMatrixFromPandas'

Do you know what could be the problem? What should I do?

Here is the code:

import scattertext as ST
import pandas as pd
import io
from IPython.display import IFrame
text1 = open("text1.txt", "r").read()
text2 = open("text2.txt", "r").read()
df = pd.DataFrame( [{'text': text.strip(), 'label': 'text1'} for text in text1.decode('utf-8', errors='ignore').split('\n')] + [{'text': text.strip(), 'label': 'text2'} for text in text2.decode('utf-8', errors='ignore').split('\n')]
)
term_doc_mat = ST.TermDocMatrixFromPandas(data_frame = df, category_col = 'label', text_col = 'text', nlp = ST.fast_but_crap_nlp ).build()
tered_term_doc_mat = (ST.TermDocMatrixFilter(pmi_threshold_coef = 3, min_freq = 10).filter(term_doc_mat))
scatter_chart_data = (ST.ScatterChart(filtered_term_doc_mat).to_dict('text1', category_name='text1', not_category_name='text2'))
viz_data_adapter = ST.viz.VizDataAdapter(scatter_chart_data)
html = ST.viz.HTMLVisualizationAssembly(viz_data_adapter).to_html()
open('subj_obj_scatter.html', 'wb').write(html.encode('utf-8'))
IFrame(src='subj_obj_scatter.html', width = 1000, height=1000)
JasonKessler commented 8 years ago

I can't replicate this. How did you install the package?

Also, make sure that there's not file or folder named "scattertext.py" or "scattertext" in your working directory.

shyamalschandra commented 8 years ago

I used pip.

Here is the error I am getting now that I removed the scattertext.pyc file from the cwd:

Traceback (most recent call last):
  File "stextt.py", line 13, in <module>
    filtered_term_doc_mat = (ST.TermDocMatrixFilter(pmi_threshold_coef = 3, min_freq = 10).filter(term_doc_mat))
TypeError: __init__() got an unexpected keyword argument 'min_freq'

I will look into the code later today. Thanks for responding so quickly!

JasonKessler commented 8 years ago

Ah. Looks like I forgot to update the param name in the example after changing it in a new version. I'll go ahead and change it, but use minimum_term_freq instead of min_freq.

shyamalschandra commented 8 years ago

Okay, I changed the parameter name to minimum_term_freq instead of min_freq and reran the code and got the following error:

iMac:Desktop shyamalsuhanachandra$ python stextt.py 
Traceback (most recent call last):
  File "stextt.py", line 15, in <module>
    scatter_chart_data = (ST.ScatterChart(filtered_term_doc_mat).to_dict('text1', category_name='text1', not_category_name='text2')) 
  File "/usr/local/lib/python2.7/site-packages/scattertext/ScatterChart.py", line 61, in to_dict
    df = self._build_dataframe_for_drawing(all_categories, category, scores)
  File "/usr/local/lib/python2.7/site-packages/scattertext/ScatterChart.py", line 188, in _build_dataframe_for_drawing
    df[df[all_categories].sum(axis=1) > self.minimum_term_frequency],
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1991, in __getitem__
    return self._getitem_array(key)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2035, in _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1214, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['text1 freq'] not in index"

Any thoughts?

shyamalschandra commented 8 years ago

I changed the names to text1 and text2 and it runs successfully.