giotto-ai / giotto-tda

A high-performance topological machine learning toolbox in Python
https://giotto-ai.github.io/gtda-docs
Other
858 stars 175 forks source link

[BUG] Mapper colormap for large range variables #399

Closed miltminz closed 4 years ago

miltminz commented 4 years ago

Describe the bug

Function plot_static_mapper_graph produces a wrong colouring of the nodes of the mapper graph. This happens apparently only for variables with large ranges.

To reproduce

import pandas as pd
import numpy as np
from sklearn.cluster import DBSCAN
from gtda.mapper import (
    CubicalCover,
    make_mapper_pipeline,
    Projection,
    plot_static_mapper_graph
)

# Create df
df = pd.DataFrame(
    {'A': np.random.randint(low=1, high=1000, size=500),
     'B': np.random.normal(size=500),
     'C': np.random.uniform(size=500)})

# Make mapper pipeline
pipe = make_mapper_pipeline(
    filter_func=Projection(columns=["A", "B"]),
    cover=CubicalCover(n_intervals=10, overlap_frac=0.3),
    clusterer=DBSCAN(),
    verbose=False,
    n_jobs=1)

# Plot static mapper
fig = plot_static_mapper_graph(pipe, df, color_by_columns_dropdown=True)
fig.show(config={'scrollZoom': True})

Expected behavior

In the dropdown menu 'Column A', graph nodes are coloured differently according to the mean value of the elements in the node.

Actual behaviour

In the dropdown menu 'Column A', all graph nodes are coloured by yellow, which corresponds to high values (above 900). Notice that colouring is correct for columns 'B' and 'C' which have smaller value ranges than 'A'.

Versions

Darwin-19.4.0-x86_64-i386-64bit Python 3.7.4 (default, Aug 13 2019, 15:17:50) [Clang 4.0.1 (tags/RELEASE_401/final)] NumPy 1.18.1 SciPy 1.3.1 Joblib 0.14.1 Scikit-learn 0.22.2.post1 Giotto-tda 0.2.1

Additional context