JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

Problem with produce_scattertext_explorer #43

Closed bassimeledath closed 5 years ago

bassimeledath commented 5 years ago

Little background: corpus = st.CorpusFromPandas(data, category_col='sentiment', text_col='reviews',nlp=nlp).build() works just fine.

My code:

html = produce_scattertext_explorer(corpus, category='sentiment', category_name='Positive', not_category_name='Negative', width_in_pixels=1000, minimum_term_frequency=5, metadata = 'stars', term_significance = st.LogOddsRatioUninformativeDirichletPrior(), include_term_category_counts=True)

The error:

AssertionError Traceback (most recent call last)

in () 7 metadata = 'stars', 8 term_significance = st.LogOddsRatioUninformativeDirichletPrior(), ----> 9 include_term_category_counts=True) 10 file_name = 'test.html' 11 open(file_name, 'wb').write(html.encode('utf-8')) ~/anaconda3/lib/python3.6/site-packages/scattertext/__init__.py in produce_scattertext_explorer(corpus, category, category_name, not_category_name, protocol, pmi_threshold_coefficient, minimum_term_frequency, minimum_not_category_term_frequency, max_terms, filter_unigrams, height_in_pixels, width_in_pixels, max_snippets, max_docs_per_category, metadata, scores, x_coords, y_coords, original_x, original_y, rescale_x, rescale_y, singleScoreMode, sort_by_dist, reverse_sort_scores_for_not_category, use_full_doc, transform, jitter, gray_zero_scores, term_ranker, asian_mode, use_non_text_features, show_top_terms, show_characteristic, word_vec_use_p_vals, max_p_val, p_value_colors, term_significance, save_svg_button, x_label, y_label, d3_url, d3_scale_chromatic_url, pmi_filter_thresold, alternative_text_field, terms_to_include, semiotic_square, num_terms_semiotic_square, not_categories, neutral_categories, extra_categories, show_neutral, neutral_category_name, get_tooltip_content, x_axis_values, y_axis_values, color_func, term_scorer, show_axes, horizontal_line_y_position, vertical_line_x_position, show_cross_axes, show_extra, extra_category_name, censor_points, center_label_over_points, x_axis_labels, y_axis_labels, topic_model_term_lists, topic_model_preview_size, metadata_descriptions, vertical_lines, characteristic_scorer, term_colors, unified_context, show_category_headings, include_term_category_counts, div_name, alternative_term_func, return_data) 446 extra_categories=extra_categories, 447 background_scorer=characteristic_scorer, --> 448 include_term_category_counts=include_term_category_counts) 449 if return_data: 450 return scatter_chart_data ~/anaconda3/lib/python3.6/site-packages/scattertext/ScatterChartExplorer.py in to_dict(self, category, category_name, not_category_name, scores, metadata, max_docs_per_category, transform, alternative_text_field, title_case_names, not_categories, neutral_categories, extra_categories, neutral_category_name, extra_category_name, background_scorer, include_term_category_counts) 108 neutral_categories=neutral_categories, 109 extra_categories=extra_categories, --> 110 background_scorer=background_scorer) 111 docs_getter = self._make_docs_getter(max_docs_per_category, alternative_text_field) 112 if neutral_category_name is None: ~/anaconda3/lib/python3.6/site-packages/scattertext/ScatterChart.py in to_dict(self, category, category_name, not_category_name, scores, transform, title_case_names, not_categories, neutral_categories, extra_categories, background_scorer) 266 267 all_categories = self.term_doc_matrix.get_categories() --> 268 assert category in all_categories 269 270 if not_categories is None: AssertionError:
JasonKessler commented 5 years ago

It's hard to tell exactly what's going on without seeing the contents of data which is presumably a pandas.DataFrame.

Your category_name (currently 'Positive') parameter must in a value in the column 'sentiment' in data. The error is saying that isn't. The same thing holds for Negative.

Also, the metadata parameter should be a array-like object that's the same length as data, and holds the titles for each of the documents shown.

bassimeledath commented 5 years ago

The Pandas DataFrame has 3 columns - reviews, stars and sentiment, where sentiment is a binary categorical variable (positive or negative). Reviews are strings and stars can range from 1-5.

I changed my code to the one below (seeing as metadata is an optional param I dropped it):

html = produce_scattertext_explorer(corpus, category='sentiment', category_name='positive', not_category_name='negative', width_in_pixels=1000, minimum_term_frequency=5, term_significance = st.LogOddsRatioUninformativeDirichletPrior(), include_term_category_counts=False) file_name = 'test.html' open(file_name, 'wb').write(html.encode('utf-8')) IFrame(src=file_name, width = 1200, height=700)

Getting this error:


AssertionError Traceback (most recent call last)

in () 7 term_significance = st.LogOddsRatioUninformativeDirichletPrior(), 8 metadata = np.array([x for x in data['stars']]), ----> 9 include_term_category_counts=False) 10 file_name = 'test.html' 11 open(file_name, 'wb').write(html.encode('utf-8')) ~/anaconda3/lib/python3.6/site-packages/scattertext/__init__.py in produce_scattertext_explorer(corpus, category, category_name, not_category_name, protocol, pmi_threshold_coefficient, minimum_term_frequency, minimum_not_category_term_frequency, max_terms, filter_unigrams, height_in_pixels, width_in_pixels, max_snippets, max_docs_per_category, metadata, scores, x_coords, y_coords, original_x, original_y, rescale_x, rescale_y, singleScoreMode, sort_by_dist, reverse_sort_scores_for_not_category, use_full_doc, transform, jitter, gray_zero_scores, term_ranker, asian_mode, use_non_text_features, show_top_terms, show_characteristic, word_vec_use_p_vals, max_p_val, p_value_colors, term_significance, save_svg_button, x_label, y_label, d3_url, d3_scale_chromatic_url, pmi_filter_thresold, alternative_text_field, terms_to_include, semiotic_square, num_terms_semiotic_square, not_categories, neutral_categories, extra_categories, show_neutral, neutral_category_name, get_tooltip_content, x_axis_values, y_axis_values, color_func, term_scorer, show_axes, horizontal_line_y_position, vertical_line_x_position, show_cross_axes, show_extra, extra_category_name, censor_points, center_label_over_points, x_axis_labels, y_axis_labels, topic_model_term_lists, topic_model_preview_size, metadata_descriptions, vertical_lines, characteristic_scorer, term_colors, unified_context, show_category_headings, include_term_category_counts, div_name, alternative_term_func, return_data) 446 extra_categories=extra_categories, 447 background_scorer=characteristic_scorer, --> 448 include_term_category_counts=include_term_category_counts) 449 if return_data: 450 return scatter_chart_data ~/anaconda3/lib/python3.6/site-packages/scattertext/ScatterChartExplorer.py in to_dict(self, category, category_name, not_category_name, scores, metadata, max_docs_per_category, transform, alternative_text_field, title_case_names, not_categories, neutral_categories, extra_categories, neutral_category_name, extra_category_name, background_scorer, include_term_category_counts) 108 neutral_categories=neutral_categories, 109 extra_categories=extra_categories, --> 110 background_scorer=background_scorer) 111 docs_getter = self._make_docs_getter(max_docs_per_category, alternative_text_field) 112 if neutral_category_name is None: ~/anaconda3/lib/python3.6/site-packages/scattertext/ScatterChart.py in to_dict(self, category, category_name, not_category_name, scores, transform, title_case_names, not_categories, neutral_categories, extra_categories, background_scorer) 266 267 all_categories = self.term_doc_matrix.get_categories() --> 268 assert category in all_categories 269 270 if not_categories is None: AssertionError: Thank you for helping me out!
bassimeledath commented 5 years ago

scattertext_use.pdf Here is an attached pdf of my code and output.

JasonKessler commented 5 years ago

Apologies! I misread your code. The category parameter should be 'positive'. The category_name is what the category will be rendered on the plot.

bassimeledath commented 5 years ago

Works perfectly! Thanks!