JasonKessler / scattertext

Beautiful visualizations of how language differs among document types.
Apache License 2.0
2.23k stars 287 forks source link

saving to HTML breaks encoding #102

Closed Stevod closed 2 years ago

Stevod commented 2 years ago

Thank you for submitting a bug report!

Steps to Reproduce

run this code, and it fails when saving html to disk:

# Here is the code snippet
import scattertext as st

df = st.SampleCorpora.ConventionData2012.get_data().assign(
    parse=lambda df: df.text.apply(st.whitespace_nlp_with_sentences)
)

corpus = st.CorpusFromParsedDocuments(
    df, category_col='party', parsed_col='parse'
).build().get_unigram_corpus().compact(st.AssociationCompactor(2000))

html = st.produce_scattertext_explorer(
    corpus,
    category='democrat', category_name='Democratic', not_category_name='Republican',
    minimum_term_frequency=0, pmi_threshold_coefficient=0,
    width_in_pixels=1000, metadata=corpus.get_df()['speaker'],
    transform=st.Scalers.dense_rank
)
open('./demo_compact.html', 'w').write(html)

Expected behavior

It is expected to open a .interactive .html file in a browser

JasonKessler commented 2 years ago

What is the error you're seeing?

JasonKessler commented 2 years ago

Since I can't reproduce the locally, and an incomplete bug report was submitted, I'm closing the issue. Happy to open it up again if some actionable information is provided.