firefly-cpp / NiaARM

A minimalistic framework for Numerical Association Rule Mining
MIT License
16 stars 5 forks source link

Scatter plot and Grouped matrix plot visualization techniques #133

Closed BukovnikMiha closed 4 months ago

BukovnikMiha commented 5 months ago

Introduces new visualization techniques for association rule mining:

  1. Scatter plot
  2. Grouped matrix plot

This addition enhances data analysis and will help identify patterns, relationships and distributions in datasets more effectively.

This addition includes new functions in the niaarm/visualize.py file:

Both of these functions receive parameters: rules (mined rules to visualize), metrics (metrics to display like lift, support, confidence, etc.) and interactive (boolean indicating if the visualization should have interactive features like zooming, hover data, etc.). The grouped_matrix_plot also receives parameter k for number of clusters to display.

It also includes 3 datasets for testing new visualizations: weather_data.csv, football_players.csv and data_developer_salary.csv . These can be found the the datasets folder. This is accompanied with dataset preparation in the examples/visualization_examples/prepare_datasets.py file, which applies preprocessing techniques to these datasets, such as removing duplicate rows, missing values, discretizing data, selecting relevant columns, etc. . Also in the examples/visualization_examples folder are two seperate folders for each visualization technique for displaying these datasets.

Example usage of the new visualization functions:

# Visualize scatter plot
fig = scatter_plot(rules=rules, metrics=metrics, interactive=True)
fig.show()

# Visualize grouped matrix plot
fig = grouped_matrix_plot(rules=rules, metrics=metrics, k=5, interactive=True)
fig.show()

Also adds new dependencies:

firefly-cpp commented 5 months ago

@zStupan, please review this submission. From the initial prescreening, I believe several additions are not applicable.

firefly-cpp commented 5 months ago

@zStupan, what about datasets and licenses?

zStupan commented 5 months ago

@firefly-cpp oh right. The football players one is taken from wikipedia so it's likely under a creative commons license, so we have to include the link to the original wikipedia article.

The weather data is synthetically generated and is under the CC0- public domain license, so there aren't any requirements there.

The dev salaries one is under Apache 2.0, which I think means we have to include a copy of the Apache 2.0 license with the dataset

BukovnikMiha commented 4 months ago

Thank you for your review.

I have made the following updates based on your feedback: