gietema / clusterfun

Explore audio and images with one line of code. Python plotting library for data visualisation.
https://clusterfun.app
Apache License 2.0
43 stars 3 forks source link
computer-vision data data-visualisation data-visualization data-viz python visualisation

example workflow

Clusterfun

Clusterfun is a python plotting library to explore image and audio data. Play around with a live demo on https://clusterfun.app.

Getting started

Clusterfun can be installed with pip:

pip install clusterfun

Clusterfun requires Python 3.8 or higher.

Plots accept data in the form of a pandas DataFrame, which will be installed automatically if not already present. No account, payment, or internet connection is required to use clusterfun. Clusterfun is open source and free to use.

A simple example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.scatter(df, x="x", y="y", media="img_path", color="painter")

Example plot Data can be hosted locally or on AWS S3.

As you can see, a clusterfun plot takes as input a pandas dataframe and column names indicating which columns to use for the visualisation. In this way, it is similar to the seaborn or the plotly library. But in clusterfun, you can:

This makes clusterfun ideal for quickly visualising image data, which can be useful in the context of building datasets, exploring edge cases and debugging model performance.

Main features

Default parameters

The default parameters for the plot types are as follows:

Plot types

The following plot types are available:

Bar chart

def bar_chart(
    df: pd.DataFrame,
    x: str,
    media: str,
    color: Optional[str] = None,
    ...
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.bar_chart(df, x="painter", media="img_path", color="style")

Example bar

Confusion matrix

def confusion_matrix(
    df: pd.DataFrame,
    y_true: str,
    y_pred: str,
    media: str,
    ...
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/cifar10.csv")
clt.confusion_matrix(df, y_true="label", y_pred="pred", media="img_path")

Example confusion matrix

Grid

def grid(
    df: pd.DataFrame,
    media: str,
    ...
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.grid(df, media="img_path")

Example grid

Histogram

def histogram(
    df: pd.DataFrame,
    x: str,
    media: str,
    bins: int = 20,
    ...
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.histogram(df, x="brightness", media="img_path")

Example histogram

Pie chart

def pie(
    df: pd.DataFrame,
    color: str,
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.pie_chart(df, color="painter", media="img_path")

Example pie

Scatterplot

    df: pd.DataFrame,
    x: str,
    y: str,
    ...
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.scatter(df, x="x", y="y", media="img_path")

Example scatter

Violin plot

def violin(
    df: pd.DataFrame,
    y: str,
    ...
) -> Path:

Parameters

Example

import pandas as pd
import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
df = df[df.painter.isin(["Pablo Picasso", "Juan Gris", "Georges Braque", "Fernand Leger"])]
clt.violin(df, y="brightness", media="img_path")

Example violin

Data loading

Clusterfun supports AWS S3 and local data storage and loading. The dataframe column corresponding to the media value in the plot will be used to determine where to load the media from.

import clusterfun as clt

df = pd.read_csv("https://raw.githubusercontent.com/gietema/clusterfun-data/main/wiki-art.csv")
clt.grid(df, media="img_column")

AWS S3 media should start with s3://. Make sure to set a AWS_REGION environment variable to the region where your data is stored.

Support for Google Cloud Storage is coming soon.