danilkolikov / fsfc

Feature Selection for Clustering
MIT License
91 stars 28 forks source link
clustering feature-selection machine-learning

================================ Feature Selection for Clustering

|mit| |docs|

FSFC is a library with algorithms of feature selection for clustering.

It's based on the article "Feature Selection for Clustering: A Review." <http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.295.8115>_ by S. Alelyani, J. Tang and H. Liu

Algorithms are covered with tests that check their correctness and compute some clustering metrics. For testing we use open datasets:

Project documentation is available on Read the Docs <http://fsfc.readthedocs.io/en/latest/>_

Implemented algorithms:

Dependencies:

How to use:

Now the project is in the early alpha stage, so it isn't publish to pip.

Because of it, installation of the project is a bit complicated. To use FSFC you should:

  1. Clone repository to your computer.
  2. Run make init to install dependencies.
  3. Copy content of the folder fsfc to the source root of your project.

After it you can use feature selectors as follows:

.. code:: python

import numpy as np
from fsfc.generic import NormalizedCut
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans

data = np.array([...])

pipeline = Pipeline([
    ('select', NormalizedCut(3)),
    ('cluster', KMeans())
])
pipeline.fit_predict(data)

How to support:

You can support development by testing and reporting of bugs or opening pull-requests.

Project has tests, they can be run with the command make test

Also code there is a Sphinx documentation for code, it can be built with the command make html. Documentation uses numpydoc, so it should be installed on the system. To do it, run pip install numpydoc.

References:

.. |mit| image:: https://img.shields.io/github/license/mashape/apistatus.svg .. |docs| image:: https://readthedocs.org/projects/fsfc/badge/?version=latest :target: http://fsfc.readthedocs.io/en/latest/?badge=latest :alt: Documentation Status