JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 289 forks source link

metadata length assertion error #27

Open choltz95 opened 6 years ago

choltz95 commented 6 years ago

Hi, I am getting the following assertion error when I include the metadata parameter. assert len(metadata) == len(data['labels'])

I am not sure I understand the error. When I check, the length of the column I am using as my metadata is the same as the length of my label column. Is there anything else I should be checking?

When I test the demo program I get no error.

Here is some relevant code: corpus = st.CorpusFromPandas(df, category_col='binary_bias', text_col='content',nlp=nlp).build()

html = st.produce_scattertext_explorer(corpus,category='left',category_name='Left',not_category_name='Right',width_in_pixels=1000,metadata=df['topic'])

JasonKessler commented 6 years ago

This error crops up when documents are either empty or composed of very infrequent tokens. I'm working on fixing this, but in the meantime, please try a couple things:

Instead of metadata=df['topic'], try metadata=corpus.get_df()['topic'].

If that doesn't work, please try

st.produce_scattertext_explorer(corpus,category='left',category_name='Left',not_category_name='Right',width_in_pixels=1000,metadata=corpus.get_df()['topic'], minimum_term_frequency=0)