MaartenGr / BERTopic

Leveraging BERT and c-TF-IDF to create easily interpretable topics.
https://maartengr.github.io/BERTopic/
MIT License
6.04k stars 757 forks source link

Support of clustering plot (2D UMAP) #584

Closed karelin closed 2 years ago

karelin commented 2 years ago

Hi there, Just wandering, if the current version of BERTopic supports 2D UMAP plot with clustering, like first plot in original post https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6

Didn't find such plot in documentation, but it could be rather useful in analysis of document collection.

drob-xx commented 2 years ago

I think that the answer is that BERTopic doesn't 'support' this particular visualization, but it is relatively easy to do on your own. What you need is a 2D representation of the embeddings. The simplest way to do this is to do a 2D reduction on the saved UMAP embeddings within your current model so something like:

2D_UMAP = umap.UMAP(MyBERTopicModel.umapmodel.embedding)

Then you can use the output for x, y coordinates for a scatter plot. The above reduction is not going to be very pretty however - because it is a 2D UMAP reduction of the 5D UMAP reduction of the original embeddings. You can get a 'nicer' looking scatter by either creating a TSNE 2D from the umapmodel.embedding like above - but with TSNE the downside being that it takes longer than UMAP. Alternatively you can get the original embeddings and UMAP reduce down to 2D the way that Maarten did in the original Medium article. Not sure if any of this is helpful.

I totally agree that plotting out the embeddings is very useful. It has fundamentally altered how I understand BERTopic. If you want code to do some of the above, you can refer to a github repo I put together as part of the discussion at #582. Hope this is helpful and not too in the weeds.

karelin commented 2 years ago

Hey Dan! Thank you very much. I think on using two UMAP transforms (ND -> 5D + ND -> 2D) then.

MaartenGr commented 2 years ago

@karelin You are in luck! I am almost finished with a function called .visualize_documents() that allows you to visualize the documents interactively, with options for optimizing the output since plotting potentially millions of points can be troublesome. I intend to push it to the currently open PR somewhere this week so you can try it out. In the meantime, thanks to @drob-xx for sharing your code to get started creating your own!

karelin commented 2 years ago

@MaartenGr Awesome! Could you post here when PR will be ready?

MaartenGr commented 2 years ago

@karelin The PR is still currently in the works but I just implemented the .visualize_documents feature for you to try out. You can find the documentation and instructions here and you can already install the PR with:

pip install --upgrade git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge` 

Doing so allows you to try it out before the release of the new version. The official release most likely will take a couple more weeks but I will let you know when it is ready!

doubianimehdi commented 2 years ago

@karelin The PR is still currently in the works but I just implemented the .visualize_documents feature for you to try out. You can find the documentation and instructions here and you can already install the PR with:

pip install --upgrade git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge` 

Doing so allows you to try it out before the release of the new version. The official release most likely will take a couple more weeks but I will let you know when it is ready!

Hi ! I've tried the command above to install the branch but it didn't work ... when I do a pip list bertopic is still in 0.10.0 version ?

doubianimehdi commented 2 years ago

I also have this warning WARNING: Did not find branch or tag 'refs/pull/578/merge', assuming revision or ref.

MaartenGr commented 2 years ago

It seems that there was a character at the end of the link that should have been removed. The install should be as follows:

pip install git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge

After doing so, you can test it by running something like the following to see if you now have the new features:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))["data"]
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)

hierarchical_topics = topic_model.hierarchical_topics(docs, topics)
doubianimehdi commented 2 years ago

Thank you but even without the character, it's not working ...

here 's the output :

Collecting git+https://github.com/MaartenGr/BERTopic.git@refs/pull/578/merge Cloning https://github.com/MaartenGr/BERTopic.git (to revision refs/pull/578/merge) to c:\users\doub2420\appdata\local\temp\pip-req-build-0bcznlrk Running command git clone --filter=blob:none --quiet https://github.com/MaartenGr/BERTopic.git 'C:\Users\doub2420\AppData\Local\Temp\pip-req-build-0bcznlrk' WARNING: Did not find branch or tag 'refs/pull/578/merge', assuming revision or ref. Running command git fetch -q https://github.com/MaartenGr/BERTopic.git refs/pull/578/merge Running command git checkout -q 2bcc9ea30c39419393a9469cd6f4954c73a201a0 Resolved https://github.com/MaartenGr/BERTopic.git to commit 2bcc9ea30c39419393a9469cd6f4954c73a201a0 Preparing metadata (setup.py) ... done Requirement already satisfied: numpy>=1.20.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from bertopic==0.10.0) (1.21.6) Requirement already satisfied: hdbscan>=0.8.28 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from bertopic==0.10.0) (0.8.28) Requirement already satisfied: umap-learn>=0.5.0 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from bertopic==0.10.0) (0.5.3) Requirement already satisfied: pandas>=1.1.5 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from bertopic==0.10.0) (1.4.2) Requirement already satisfied: scikit-learn>=0.22.2.post1 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from bertopic==0.10.0) (0.24.2) Requirement already satisfied: tqdm>=4.41.1 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from bertopic==0.10.0) (4.64.0) Requirement already satisfied: sentence-transformers>=0.4.1 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from bertopic==0.10.0) (2.2.0) Requirement already satisfied: plotly>=4.7.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from bertopic==0.10.0) (5.7.0) Requirement already satisfied: pyyaml<6.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from bertopic==0.10.0) (5.4.1) Requirement already satisfied: scipy>=1.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from hdbscan>=0.8.28->bertopic==0.10.0) (1.8.0) Requirement already satisfied: joblib>=1.0 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from hdbscan>=0.8.28->bertopic==0.10.0) (1.1.0) Requirement already satisfied: cython>=0.27 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from hdbscan>=0.8.28->bertopic==0.10.0) (0.29.28) Requirement already satisfied: pytz>=2020.1 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from pandas>=1.1.5->bertopic==0.10.0) (2022.1) Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from pandas>=1.1.5->bertopic==0.10.0) (2.8.2) Requirement already satisfied: six in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from plotly>=4.7.0->bertopic==0.10.0) (1.16.0) Requirement already satisfied: tenacity>=6.2.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from plotly>=4.7.0->bertopic==0.10.0) (8.0.1) Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from scikit-learn>=0.22.2.post1->bertopic==0.10.0) (3.1.0) Requirement already satisfied: transformers<5.0.0,>=4.6.0 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from sentence-transformers>=0.4.1->bertopic==0.10.0) (4.18.0) Requirement already satisfied: torch>=1.6.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from sentence-transformers>=0.4.1->bertopic==0.10.0) (1.11.0) Requirement already satisfied: torchvision in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from sentence-transformers>=0.4.1->bertopic==0.10.0) (0.12.0) Requirement already satisfied: nltk in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from sentence-transformers>=0.4.1->bertopic==0.10.0) (3.7) Requirement already satisfied: sentencepiece in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from sentence-transformers>=0.4.1->bertopic==0.10.0) (0.1.96) Requirement already satisfied: huggingface-hub in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from sentence-transformers>=0.4.1->bertopic==0.10.0) (0.5.1) Requirement already satisfied: colorama in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from tqdm>=4.41.1->bertopic==0.10.0) (0.4.3) Requirement already satisfied: numba>=0.49 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from umap-learn>=0.5.0->bertopic==0.10.0) (0.55.1) Requirement already satisfied: pynndescent>=0.5 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from umap-learn>=0.5.0->bertopic==0.10.0) (0.5.6) Requirement already satisfied: setuptools in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from numba>=0.49->umap-learn>=0.5.0->bertopic==0.10.0) (58.1.0) Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from numba>=0.49->umap-learn>=0.5.0->bertopic==0.10.0) (0.38.0) Requirement already satisfied: typing-extensions in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from torch>=1.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (4.2.0) Requirement already satisfied: requests in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (2.27.1) Requirement already satisfied: filelock in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (3.6.0) Requirement already satisfied: sacremoses in c:\users\doub2420\appdata\roaming\python\python39\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (0.0.50) Requirement already satisfied: regex!=2019.12.17 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (2022.4.24) Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (0.12.1) Requirement already satisfied: packaging>=20.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (21.3) Requirement already satisfied: click in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from nltk->sentence-transformers>=0.4.1->bertopic==0.10.0) (8.0.0) Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from torchvision->sentence-transformers>=0.4.1->bertopic==0.10.0) (9.1.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from packaging>=20.0->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (3.0.8) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (1.26.9) Requirement already satisfied: idna<4,>=2.5 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (3.3) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (2.0.12) Requirement already satisfied: certifi>=2017.4.17 in c:\users\doub2420\appdata\local\programs\python\python39\lib\site-packages (from requests->transformers<5.0.0,>=4.6.0->sentence-transformers>=0.4.1->bertopic==0.10.0) (2021.10.8)

doubianimehdi commented 2 years ago

and when I try hierarchical_topics = topic_model.hierarchical_topics(abstract, topics)


AttributeError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_14196/2730210245.py in ----> 1 hierarchical_topics = topic_model.hierarchical_topics(abstract, topics)

AttributeError: 'BERTopic' object has no attribute 'hierarchical_topics'

MaartenGr commented 2 years ago

I would advise starting from a completely fresh environment and then installing BERTopic via de link provided instead. Then, after installing, make sure to restart the notebook that you are working in.

doubianimehdi commented 2 years ago

I've been able to test the features and I have a request : Would that be possible to display the text with carriage returns and also if we have a URL in the data could make the data point clickable and open the URL ? besides that it's seems exactly what i've been looking for !

Thank you again for your AMAZING work !

doubianimehdi commented 2 years ago

And also on the hierarchical visualization, we don't see the text on hover and we can't click on it either like the non-hierarchical one

MaartenGr commented 2 years ago

Would that be possible to display the text with carriage returns

I believe that Plotly does not generate newlines on either carriage returns or line feeds. What might work is using <br> instead but in my experience Plotly's go.Scattergl does not behave entirely the same as the regular scatterplots, so there is a chance that it will not work.

if we have a URL in the data could make the data point clickable and open the URL

I just checked the Plotly documentation and from what I can tell this is unfortunately not possible in their current API.

And also on the hierarchical visualization, we don't see the text on hover and we can't click on it either like the non-hierarchical one

Strange, for me the following is working without any problems:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))["data"]
topic_model = BERTopic(verbose=True)
topics, probs = topic_model.fit_transform(docs)

hierarchical_topics = topic_model.hierarchical_topics(docs, topics)

Then, visualize the hierarchy with hover:

topic_model.visualize_hierarchy(hierarchical_topics=hierarchical_topics)

Could you share the code you have been using to get the hierarchical visualization?

doubianimehdi commented 2 years ago

I meant this function;:

Run the visualization with the original embeddings topic_model.visualize_hierarchical_documents(abstract, hierarchical_topics, embeddings=embeddings)

hovering doesn't work like in : Run the visualization with the original embeddings topic_model.visualize_documents(abstract, embeddings=embeddings)

as for the hover and clickable URL , in the Doc2Map package he used this :

def plotly_interactive_map(self, G=None, root=None):

    def cluster(node, lLeaf, image=True):

        fig = go.Figure(go.Scatter(
            y = [self.lDocEmbedding2D[i,0] for i in lLeaf],
            x = [self.lDocEmbedding2D[i,1] for i in lLeaf],
            mode = 'markers',
            #marker = {"size": 0.7}
            customdata=[([data["URL"]], [data["label"]]) for data in self.lData],
            hovertemplate=(
                "Label: <b>%{customdata[1]}</b><br>"+
                "URL: %{customdata[0]}"+
                "<extra></extra>")
        ))

I believe that customdata and hovertemplate could be used in similar manner in scattergl (https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Scattergl.html) It mentions the customdata and hovertemplate too ...

Or maybe i'm completely wrong, I've never built python packages before ...

Thanks again !

MaartenGr commented 2 years ago

topic_model.visualize_hierarchical_documents(abstract, hierarchical_topics, embeddings=embeddings)

That is correct, hovering is turned off by default as you risk memory errors by loading in so many documents. The following should do the trick:

topic_model.visualize_hierarchical_documents(
    abstract, 
    hierarchical_topics, 
    embeddings=embeddings, 
    hide_document_hover=False
)

There are quite a few parameters that you can find in the visualization functions. Going through the docstrings should help quite a bit.

I believe that customdata and hovertemplate could be used in similar manner in scattergl (https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Scattergl.html) It mentions the customdata and hovertemplate too ...

Unfortunately this is not possible at the moment as go.Scattergl has a few issues generating the same hovers as the regular go.Scatter. See this issue for example.

doubianimehdi commented 2 years ago

Thank you ! Too bad for Scattergl ... what effort would it take to modify it to use scatter instead and have the URLs ? Just to see what it would take !

Thanks so much again !!!

MaartenGr commented 2 years ago

@doubianimehdi go.Scattergl is necessary for scalability. Plotly can have issues visualizing thousands of points, let alone millions. For that reason, we need something that can handle that a bit better than the regular go.Scatter. If the hover issue gets fixed in Plotly, I'll make sure to implement it in BERTopic!

MaartenGr commented 2 years ago

Seeing this was implemented in v0.11, I will close this issue for now. Feel free to ping me if you want to continue this discussion.

doubianimehdi commented 1 year ago

@MaartenGr Hi ! Thank you for your wonderful work ! I was getting back to this implementation because I wanted to do a visualization similar to this : https://get.carrotsearch.com/foamtree/latest/demos/large.html

But for that I have to use this : https://get.carrotsearch.com/foamtree/latest/api/

I'm not a front end man at all ... unfortunately ... I was wondering if you or some talented member of this community, could do this or help to do this ?

Thank you so much again !

MaartenGr commented 1 year ago

@doubianimehdi If you want to keep it straightforward, then you can also use plotly for this as it has implemented Treemaps. Other than that, I am not familiar with carrotsearch unfortunately.

doubianimehdi commented 1 year ago

@MaartenGr Thanks ! That's what I was thinking for my Proof of Concept phase ... but later the beautiful interface of carrotsearch would be a good addition to my final product !

doubianimehdi commented 1 year ago

@MaartenGr I'm having a hard time seeing how I can use the hierarchical topics dataframe to adapt it to a treemap ... could you give me some clue to achieve this ? Thank you !

MaartenGr commented 1 year ago

@doubianimehdi No problem, it is just a few lines of code to get this working:

# Prepare children
children_left = (hierarchical_topics.Child_Left_ID + "_" + hierarchical_topics.Child_Left_Name).tolist()
children_right = (hierarchical_topics.Child_Right_ID + "_" + hierarchical_topics.Child_Right_Name).tolist()
children = children_left + children_right

# Prepare parents
parents = (hierarchical_topics.Parent_ID + "_" + hierarchical_topics.Parent_Name).tolist()
parents = parents + parents

# Plot treemap
import plotly.express as px
fig = px.treemap(names = children, parents = parents)
fig.update_traces(root_color="lightgrey")
fig.update_layout(margin = dict(t=50, l=25, r=25, b=25))
fig.show()
doubianimehdi commented 1 year ago

@MaartenGr Thank you ! I want to go further and make the full hierarchy with a slider to navigate through the level of topics ... what you it take to do it ?

MaartenGr commented 1 year ago

@doubianimehdi I am not sure whether something like that is possible. You would have to dive into the source code of plotly I think.

doubianimehdi commented 1 year ago

@MaartenGr https://towardsdatascience.com/make-a-treemap-in-python-426cee6ee9b8 it's possible but the structure of hierarchical topics is confusing to me ... i'm having a hard time translating it ...

MaartenGr commented 1 year ago

@doubianimehdi If you follow along with that tutorial and use the code I shared above, I think it might be possible. You would have to try some things out yourself first. Do note though that the widget is jupyter-specific module and not part of plotly.

doubianimehdi commented 1 year ago

Hi @MaartenGr I've done some tests ... i'm almost there but I can't wrap my head around something : Here is the dataframe i'm using :

Name | Level | Parent_ID | Num_Documents -- | -- | -- | -- 2 | pt catalyst_pt catalysts_electrocatalysts_tio2_electrocatalyst | 9 | 50 | 535 1 | pt catalysts_catalysts_membrane fuel_catalyst_electrocatalysts | 9 | 50 | 603 11 | blend membranes_methanol permeability_membranes proton_methanol fuel_hybrid membranes | 7 | 51 | 370 0 | nanocomposite membranes_composite membrane_composite membranes_membrane fuel_membranes proton | 7 | 51 | 932 18 | membrane fuel_membranes fuel_nafion membrane_nafion membranes_methanol fuel | 6 | 52 | 302 15 | membrane fuel_membranes fuel_membranes pems_membrane materials_membrane pem | 6 | 52 | 324 4 | membrane fuel_pem fuel_pemfc_membrane pem_fuel cell | 9 | 53 | 493 3 | model pemfc_pem fuel_pemfc model_pemfcs_pemfc | 9 | 53 | 505 50 | pt catalyst_pt catalysts_electrocatalysts_membrane fuel_catalysts | 8 | 54 | 1138 7 | pem fuel_membrane fuel_electrodeposition_pemfcs_pemfc | 8 | 54 | 417 10 | pt catalyst_pt catalysts_graphene oxide_membrane fuel_electrocatalysts | 7 | 55 | 385 54 | pt catalyst_pt catalysts_membrane fuel_pt nanoparticles_cells pemfcs | 7 | 55 | 1555 29 | pemfc_pemfcs_pem fuel_cells pemfcs_membrane fuel | 6 | 56 | 195 30 | membrane fuel_pem fuel_porous layer_pemfc_pemfcs | 6 | 56 | 194 13 | membrane fuel_electrolyzer_electrolyser_electrochemical impedance_pem fuel | 8 | 57 | 331 53 | membrane fuel_pem fuel_pemfcs_pemfc stack_pemfc | 8 | 57 | 998 5 | pemfc_pem fuel_membrane fuel_cell pemfc_fuel cell | 6 | 58 | 462 6 | pemfc_pemfcs_cell pemfc_membrane fuel_pem fuel | 6 | 58 | 438 58 | pemfc_pemfcs_membrane fuel_pem fuel_cell pemfc | 5 | 59 | 900 24 | membrane fuel_pem fuel_pressure drop_flow pressure_gas flow | 5 | 59 | 232 16 | membrane fuel_porous media_pem fuel_pemfc_porous | 5 | 60 | 308 56 | pem fuel_pemfc_pemfcs_membrane fuel_cell pemfc | 5 | 60 | 389 23 | fuel cells_fuel cell_membrane fuel_hydrogen fuel_pem fuel | 7 | 61 | 238 34 | fuel cell_hydrogen fuel_fuel cells_membrane fuel_pem fuel | 7 | 61 | 149 9 | multiblock copolymers_copolymers_block copolymers_copolymer_polymer | 6 | 62 | 414 51 | nanocomposite membranes_membrane fuel_composite membranes_composite membrane_membranes proton | 6 | 62 | 1302 20 | membranes_poly vinylidene_exchange membranes_sulfonic acid_vinylidene fluoride | 5 | 63 | 282 62 | membrane fuel_composite membranes_composite membrane_membranes proton_proton conductivities | 5 | 63 | 1716 21 | energy exergy_pemfc_membrane fuel_pemfc stack_fuel cell | 7 | 64 | 257 19 | power hydrogen_solar energy_hydrogen production_electrolyzer_pem electrolyzer | 7 | 64 | 285 8 | anode_carbon corrosion_electrochemical_membrane fuel_pem fuel | 8 | 65 | 415 17 | anode catalyst_membrane fuel_cathode catalyst_anode_pemfcs | 8 | 65 | 304 33 | hydrogen production_co oxidation_catalysts_co hydrogen_co2 | 6 | 66 | 153 28 | hydrogen production_steam reformer_steam reforming_methanol steam_membrane fuel | 6 | 66 | 219 32 | membrane fuel_fuel cell_fuel cells_pem fuel_anode | 7 | 67 | 158 57 | pem fuel_pemfcs_pemfc_membrane fuel_fuel cell | 7 | 67 | 1329 36 | membrane fuel_pem fuel_anodes_fuel cell_fuel cells | 7 | 68 | 146 65 | membrane fuel_pem fuel_anode_cathode catalyst_pemfcs | 7 | 68 | 719 46 | de energia_rendimiento_energia_para la_celulas combustivel | 5 | 69 | 33 47 | dans les_dans le_pour les_dans la_materiaux | 5 | 69 | 31 52 | membrane fuel_membranes fuel_membrane pem_nafion membrane_nafion membranes | 5 | 70 | 626 25 | membranes_exchange membrane_exchange membranes_membrane_proton conductivity | 5 | 70 | 226 27 | carbon composite_pemfc_pem fuel_membrane fuel_graphite | 7 | 71 | 220 12 | corrosion behavior_corrosion density_corrosion resistance_stainless steel_cathodic | 7 | 71 | 336 22 | electrolysis hydrogen_electrolyzer_water electrolysis_electrolysis water_membrane electrolysis | 6 | 72 | 238 26 | electrolyzers_water electrolyzers_water electrolysis_electrolyzer_electrolysis | 6 | 72 | 224 39 | nanocomposite membranes_membranes pems_nafion membrane_graphene oxide_composite membranes | 4 | 73 | 106 37 | nanofiber composite_nanofiber_nanofibers_electrospun nanofiber_electrospun nanofibers | 4 | 73 | 142 35 | pemfc_cooled fuel_pem fuel_pemfc stack_heat flux | 6 | 74 | 149 67 | pem fuel_membrane fuel_pemfcs_pemfc_fuel cell | 6 | 74 | 1487 70 | membrane fuel_membranes fuel_nafion membrane_nafion membranes_pemfc | 4 | 75 | 852 63 | membrane fuel_membranes proton_composite membranes_composite membrane_membranes | 4 | 75 | 1998 31 | membrane fuel_membrane electrode_membrane_exchange membrane_fuel cell | 9 | 76 | 184 42 | membrane material_membrane proton_exchange membrane_sulfonic acid_membrane | 9 | 76 | 73 64 | hydrogen production_energy exergy_electrolyzer_pem electrolyzer_electrolyser | 6 | 77 | 542 61 | fuel cell_fuel cells_membrane fuel_hydrogen fuel_electrical energy | 6 | 77 | 387 74 | membrane fuel_pem fuel_pemfcs_pemfc_fuel cell | 5 | 78 | 1636 38 | dc converter_fuel cell_converter_fuel cells_membrane fuel | 5 | 78 | 136 73 | membranes pems_nanofiber_nanofibers_membrane fuel_composite membranes | 3 | 79 | 248 75 | membrane fuel_composite membranes_composite membrane_membranes proton_pems | 3 | 79 | 2850 68 | membrane fuel_anode_cathode catalyst_pem fuel_cells pemfcs | 6 | 80 | 865 55 | pt catalyst_pt catalysts_electrocatalysts_carbon supported_membrane fuel | 6 | 80 | 1940 44 | membrane fuel_electrochemical environment_temperature voltage_fuel cell_electrolysis cell | 5 | 81 | 38 40 | porous silicon_membrane fuel_silicon membrane_membraneless fuel_mems fuel | 5 | 81 | 102 59 | pemfc_membrane fuel_pem fuel_cell pemfc_fuel cell | 4 | 82 | 1132 60 | membrane fuel_porous layer_pem fuel_pemfc_pemfcs | 4 | 82 | 697 78 | membrane fuel_pem fuel_pemfcs_pemfc_pemfc stack | 4 | 83 | 1772 81 | membrane fuel_mems fuel_silicon membrane_fuel cell_membraneless fuel | 4 | 83 | 140 41 | viscoplastic_mechanical properties_membrane fuel_stress strain_mechanical durability | 9 | 84 | 96 14 | anode_microbial fuel_membrane microbial_anode chamber_anode cathode | 9 | 84 | 329 66 | hydrogen production_hydrogen gas_membrane fuel_steam reforming_methanol steam | 5 | 85 | 372 77 | hydrogen fuel_hydrogen storage_hydrogen production_fuel cell_fuel cells | 5 | 85 | 929 76 | membrane fuel_membrane electrode_membrane_exchange membrane_membrane invention | 8 | 86 | 257 84 | anode_microbial fuel_anode chamber_membrane microbial_anode cathode | 8 | 86 | 425 79 | membrane fuel_composite membranes_composite membrane_nanocomposite_membranes proton | 2 | 87 | 3098 45 | membrane vanadium_nafion membrane_membranes vanadium_vanadium redox_vanadium permeability | 2 | 87 | 37 72 | water electrolyzers_water electrolysis_water electrolyzer_electrolyzer_electrolyzers | 5 | 88 | 462 80 | pt catalyst_pt catalysts_membrane fuel_catalyst layer_electrocatalysts | 5 | 88 | 2805 86 | microbial fuel_anode_anode chamber_membrane fuel_microbial | 7 | 89 | 682 49 | references_fuel cell_fuel cells_fuels 14_figures xi | 7 | 89 | 6 48 | 실리카 나노_연료전지 스택의_생물전기화학적 수소_고분자 전해질_전해질 연료전지 | 6 | 90 | 23 43 | menghasilkan_yang lebih_menunjukkan_menunjukkan bahwa_dan tegangan | 6 | 90 | 41 89 | anode_microbial fuel_anode chamber_membrane fuel_cod removal | 6 | 91 | 688 71 | corrosion density_corrosion resistance_steel bipolar_pemfc_pemfcs | 6 | 91 | 556 82 | membrane fuel_pem fuel_pemfc_pemfcs_fuel cell | 3 | 92 | 1829 83 | pem fuel_membrane fuel_pemfcs_pemfc_pemfc stack | 3 | 92 | 1912 90 | menghasilkan_yang lebih_menunjukkan_menunjukkan bahwa_dan tegangan | 5 | 93 | 64 91 | pemfc_anode_corrosion resistance_stainless steel_steel bipolar | 5 | 93 | 1244 85 | energy exergy_hydrogen production_hydrogen storage_hydrogen fuel_electrolyzer | 4 | 94 | 1301 88 | pt catalyst_pt catalysts_catalyst layer_catalysts_electrocatalysts | 4 | 94 | 3267 69 | rendimiento_energia_combustible de_para la_de combustible | 4 | 95 | 64 93 | pemfc_corrosion resistance_anode_membrane fuel_electrochemical | 4 | 95 | 1308 94 | pem fuel_membrane fuel_membrane pem_pemfcs_pemfc | 3 | 96 | 4568 95 | pemfc_membrane fuel_anode_electrochemical_corrosion resistance | 3 | 96 | 1372 96 | membrane fuel_pem fuel_electrolysis_membrane pem_pemfcs | 2 | 97 | 5940 92 | membrane fuel_pem fuel_pemfc_pemfcs_membrane pem | 2 | 97 | 3741 97 | membrane fuel_pem fuel_pemfcs_pemfc_membrane pem | 1 | 98 | 9681 87 | membrane fuel_composite membranes_composite membrane_nanocomposite_membranes proton | 1 | 98 | 3135 98 | membrane fuel_pem fuel_fuel cell_membrane pem_pemfc | 0 | | 12816

Then i'm using this snippet of code : `import plotly.graph_objs as go

def generate_treemap(level): return go.Figure( go.Treemap( labels=tree_df['Name'], ids=tree_df['ID'], parents=tree_df['Parent_ID'], customdata=tree_df['Num_Documents'], text=tree_df['Level'].apply(lambda x: '' if x > level else None), hovertemplate="%{label}
ID: %{id}
Num Documents: %{customdata}", visible=level == 0, ) )

max_level = tree_df['Level'].max() figures = [generate_treemap(level) for level in range(max_level + 1)]

fig = go.Figure(figures[0])

for level in range(1, max_level + 1): fig.add_trace(figures[level]['data'][0])

steps = [] for index in range(max_level + 1): step = dict( method="restyle", args=["visible", [False] * (max_level + 1)], label=str(index) ) step["args"][1][index] = True steps.append(step)

sliders = [dict( active=0, currentvalue={"prefix": "Level: "}, pad={"t": 20}, steps=steps )]

fig.update_layout(sliders=sliders) fig.show()`

It works but the slider is not making the nesting and level change ...

Can you help ?

MaartenGr commented 1 year ago

I am not entirely sure but based on the Plotly documentation it seems that you will have to do an "update" method and not a "restyle" method. I would advise following the example linked to the Plotly documentation and replacing it with the treemap. Having said that, the link you provided shows an example with ipywidgets and the code you shared is a slider with plotly, so I am not sure whether the latter works with treemaps.

doubianimehdi commented 1 year ago

I DID IT ! `import plotly.graph_objects as go import pandas as pd

def create_treemap_data(level): mask = tree_df['Level'] <= level return go.Treemap( labels=tree_df.loc[mask, 'Name'], ids=tree_df.loc[mask, 'ID'], parents=tree_df.loc[mask, 'Parent_ID'], customdata=tree_df.loc[mask, 'Num_Documents'], hovertemplate="%{label}
ID: %{id}
Num Documents: %{customdata}", )

max_level = tree_df['Level'].max()

fig_dict = { "data": [create_treemap_data(0)], "layout": {}, "frames": [], }

Create frames for each level

for level in range(1, max_level + 1): frame = {"data": [create_treemap_data(level)], "name": str(level)} fig_dict["frames"].append(frame)

Create slider steps

steps = [] for level in range(max_level + 1): step = {"args": [ [str(level)], {"frame": {"duration": 300, "redraw": True}, "mode": "immediate", "transition": {"duration": 300}}], "label": str(level), "method": "animate"} steps.append(step)

Configure slider

sliders = [{"active": 0, "steps": steps, "x": 0.1, "y": 0, "len": 0.9}]

Add slider to layout

fig_dict["layout"]["sliders"] = sliders

Create figure from the dictionary

fig = go.Figure(fig_dict)

fig.show()`

It works because it handles the transition and animation when you move the slider

MaartenGr commented 1 year ago

Great! Glad to hear that you found the solution.

doubianimehdi commented 1 year ago

@MaartenGr thank you for your help ! that would be great to have a visualization like that in bertopic :)

MaartenGr commented 1 year ago

@doubianimehdi I cannot make any promises as I do not want to depend too much on plotly since it might be replaced in the future with a different plotting library but I definitely keep it in mind!