MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.1k stars 761 forks source link

Cannot Implement Multi-color Intertopic Distance Map Visualization #1496

Open pariskang opened 1 year ago

pariskang commented 1 year ago

Hello,

I'm trying to customize the intertopic distance map in BERTopic to show different color bubbles for each topic. I've attempted to modify the _topic.py file to include Seaborn's color palette but haven't achieved the desired effect. I am wondering how best to proceed.

My Approach Here is the relevant section of code I modified in the _topic.py file to support multi-color visualization:

# I added this code to generate different colours
color_palette = sns.color_palette("hsv", len(topics)).as_hex()

# ... continued with other parts of the code
df = pd.DataFrame({"x": embeddings[:, 0], "y": embeddings[:, 1],
                   "Topic": topic_list, "Words": words, "Size": frequencies, "Color": color_palette})

I did not see any change in the colors of the bubbles after running this modified code.

What I Expected: I expected to see each bubble in the intertopic distance map with a unique color, as defined by the Seaborn's color palette.

What Happened: However, the colors in the intertopic distance map remained unchanged.

Questions: Is the _topic.py the correct place to modify for customizing colors in the intertopic distance map? Are there specific steps or methods within BERTopic that I need to utilize to change the colors effectively? Could you guide me on how to modify the code so that it would achieve the desired multi-color visualization? Thank you for your time and consideration. I look forward to your guidance to resolve this issue.

Best regards, Paris Kang

newplot (22)

MaartenGr commented 1 year ago

Thank you for opening up this issue. Unfortunately, the formatting of your code in Github is still a bit unclear. Could you make sure everything is formatted as Python? Also, could you go into more depth about what the issue is and what kind of resolution you are looking for? More sure to be complete in your description.

pariskang commented 1 year ago

Thank u for kindly and swiftly reply~I have updated the issue description just now

MaartenGr commented 1 year ago

You are giving it a color palette that is overwritten by the blue-ish color that you see here:

https://github.com/MaartenGr/BERTopic/blob/951b97645acdf55e184889c761a83d2e1d73812f/bertopic/plotting/_topics.py#L91C5-L91C32

That function, get_color, defines the color that is chosen upon selection of the visualization. You would have to adjust the code there to show different colors.

More specifically, I would advise removing the slider and therefore the get_color function as it will mess with your multi-color visualization.