Efficient and fuzzy clustering based on the CLARA algorithm
The fuzzyclara
package tackles two issues of cluster analysis applications.
First, it includes routines for fuzzy clustering which avoid the common hard
clustering assumption that each observation is a clear member of one sole
cluster. Instead, membership probabilities indicate to which extent the
characteristics of each observation are shaped by the characteristics of several
'typical' clusters. Second, the estimation of classical clustering algorithms
is often only hardly or not at all feasible in large data situations with
thousands of observations. Subsampling-based algorithms building on the CLARA
algorithm are implemented to make the estimation feasible in such situations.
Building on these two points, the 'fuzzyclara' package offers routines for all
aspects of a cluster analysis, including the use of user-defined distance
functions and diverse visualization techniques.
To get an overview of the functionalities of the package, check out the package vignette.
The most current version from GitHub can be installed via
devtools::install_github("MaxWeigert/fuzzyclara")
If you encounter problems with the package, find bugs or have suggestions for additional functionalities please open a GitHub issue. Alternatively, feel free to contact us directly via email.
Contributions (via pull requests or otherwise) are welcome. Before you open a pull request or share your updates with us, please make sure that all unit tests pass without errors or warning messages. You can run the unit tests by calling
devtools::test()