giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
858 stars 175 forks source link

Confusion with the number of points in a node using plot_static_mapper_graph and graph= pipe.fit_transform #657

Closed jnukpezah closed 1 year ago

jnukpezah commented 1 year ago

Hi, I am running Giotto-tda version 0.5.1. I run the mapper code on some data that I had using filter_func = umap.UMAP which runs fine producing the plot_static_mapper_graph My problem is this, when I run the graph= pipe.fit_transform (using same pipe that I used in the plot_static_mapper_graph) and extract the nodes_elements = graph.vs["node_elements"], the node size for the nodes in the plot_static_mapper_graph do not match with the node size using the nodes_elements construct. I am not sure why that is or there is something that I am missing? Thanks. Jon

ulupo commented 1 year ago

Hi! Thanks for the details. Would you be able to produce a minimal code example which illustrates the issue?

jnukpezah commented 1 year ago

Hi!, Yes I can. The code example is below. Thanks.

Define filter function

filter_func = umap.UMAP(n_neighbors=5)

Define cover

cover = CubicalCover(kind='balanced', n_intervals=10, overlap_frac=0.2)

cover = CubicalCover(n_intervals=10, overlap_frac=0.2)

Choose clustering algorithm

clusterer = DBSCAN(eps=10)

Initialise pipeline

pipe = make_mapper_pipeline( filter_func=filter_func, cover=cover, clusterer=clusterer, verbose=True, n_jobs=-1, )

Plot Mapper graph where df_refined is a Pandas dataframe

fig = plot_static_mapper_graph(pipe, df_refined, color_data = df_refined) fig.show(config={'scrollZoom': True})

Node_elements from attributes.

graph = pipe.fit_transform(df_refined) node_elements = graph.vs["node_elements"]

jnukpezah commented 1 year ago

When I hover over the nodes in the plot_static_mapper graph and match the same node in the node_elements, the node sizes are different. Thanks

ulupo commented 1 year ago

Thanks @jnukpezah! I can reproduce using the example provided.

The issue does not seem to come from giotto-tda but rather from UMAP. Indeed, the UMAP class constructor has a random_state parameter, which is by default set to None, meaning that a "random random seed" is generated every time fit is called. Thus, running the UMAP part of the Mapper pipeline gives different results every time.

To have a fully reproducible pipeline, and solve your issue, set random_state in UMAP to some integer value of your choice.

jnukpezah commented 1 year ago

Hi @ulupo Got it! Thanks for the insight. Will do so! Great library btw!

ulupo commented 1 year ago

Great! Happy to help!