MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.19k stars 765 forks source link

There are Chinese characters in my project, but after calling the visualize_document_datamap() method, the characters appear as garbled text. #2211

Open superseanyoung opened 1 week ago

superseanyoung commented 1 week ago

Have you searched existing issues? 🔎

Desribe the bug

fig = topic_model.visualize_document_datamap( sentences, topics=topics, reduced_embeddings=reduced_embeddings,

custom_labels=custom_labels,

title='文档和主题的分布',
sub_title='基于 BERTopic 的主题建模',
width=1200,
height=1200

) Even after setting plt.rcParams['font.sans-serif'] = ['SimHei'], I still can't see the characters.

Reproduction

from bertopic import BERTopic
# with the reduced embeddings
reduced_embeddings = UMAP(n_neighbors=15, n_components=2, min_dist=0.0, metric='cosine').fit_transform(embeddings)
fig = topic_model.visualize_document_datamap(
    sentences,
    topics=topics,
    reduced_embeddings=reduced_embeddings,
    #custom_labels=custom_labels,
    title='文档和主题的分布',
    sub_title='基于 BERTopic 的主题建模',
    width=1200,
    height=1200
)

BERTopic Version

0.16.4

MaartenGr commented 1 week ago

Hmmm, I'm not entirely sure what is needed here. Have you tried posting an issue on the DataMapPlot repository? I think there isn't much to do from my end since I'm just calling that package and passing the data.

superseanyoung commented 1 week ago

Can the "visualize_document_datamap()" method set font display parameters?

MaartenGr commented 1 week ago

@superseanyoung You can check all parameters implemented here or here