annoviko / pyclustering

pyclustering is a Python, C++ data mining library.
https://pyclustering.github.io/
BSD 3-Clause "New" or "Revised" License
1.17k stars 248 forks source link

[pyclustering.cluster.xmeans] Specify probabilistic bounds for MNDL #624

Closed annoviko closed 4 years ago

annoviko commented 4 years ago

Introduction alpha and betta are by default 0.9. These values might affect X-Means results and it is useful to have an access to them via **kwargs.

Description Introduce alpha and betta for MDNL X-Means.

annoviko commented 4 years ago

Usage example in case of Python (using arguments alpha and beta in constructor):

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.xmeans import xmeans, splitting_type
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.utils import read_sample
from pyclustering.samples.definitions import FCPS_SAMPLES

# Read sample 'Target' from file.
sample = read_sample(FCPS_SAMPLES.SAMPLE_TARGET)

# Random state.
seed = 1000

# Prepare initial centers - amount of initial centers defines amount of clusters from which X-Means will start analysis.
amount_initial_centers = 3
initial_centers = kmeans_plusplus_initializer(sample, amount_initial_centers, random_state=seed).initialize()

# Create instance of X-Means algorithm.
xmeans_mndl = xmeans(sample, initial_centers, 20, splitting_type=splitting_type.MINIMUM_NOISELESS_DESCRIPTION_LENGTH, alpha=0.5, beta=0.5, random_state=seed).process()

# Extract X-Means MNDL clustering results:
mndl_clusters = xmeans_mndl.get_clusters()

# Visualize clustering results
visualizer = cluster_visualizer(1, titles=['MNDL'])
visualizer.append_clusters(mndl_clusters, sample, 0)
visualizer.show()

Usage example in case of C++ (methods: set_mndl_alpha_bound and set_mndl_beta_bound):

pyclustering::clst::xmeans solver(centers, p_kmax, p_tolerance, pyclustering::clst::splitting_type::MINIMUM_NOISELESS_DESCRIPTION_LENGTH);
solver.set_mndl_alpha_bound(p_alpha);   // <--- set alpha probabilistic bound for MNDL splitting criteria X-Means.
solver.set_mndl_beta_bound(p_beta);      // <--- set beta probabilistic bound for MNDL splitting criteria X-Means.

Figure_1