PIA-Group / BioSPPy

Biosignal Processing in Python
Other
569 stars 273 forks source link

Clustering Example #96

Closed Torky24 closed 2 years ago

Torky24 commented 2 years ago

I was hoping if there could be some example for "biosppy.clustering" and how it ties in for example with accelration data. I noticed that it lacked documentation on the website and I was hoping for some clarification if possible.

Thank you!

afonsof3rreira commented 2 years ago

Hi @Torky24, thank you for the feedback!

The biosppy.clustering module provides a library of functions, each using a different clustering algorithm. If you go through the documentation (see the docstrings) of these functions, you'll notice the input argument data, which is "An m by n array of m data samples in an n-dimensional space". This means you need to feed your clustering algorithm of choice with a 2D numpy array, where $M$ = your samples and $N$ = number of features. The other parameters will essentially tune how the clustering algorithm works (e.g. changing k=2 to k=3 in the biosppy.clustering.kmeans, will force the algorithm to group data in 3 clusters instead of 2).

Regarding the combination of clustering + acceleration (ACC) features, both the time and frequency domain features apply to any given signal segment. In order to use these features as input ($N$ dimension) in a clustering algorithm, you'll need to build your own pipeline:

  1. Define the duration of your samples (what are you trying to classify?).
  2. Which features are relevant for the contexts you're classifying? You have functions inside biosppy.signals.acc that extract time- and frequency-domain features.
  3. Feed samples of ACC signal segments defined in 1. into the feature extractor you chose in 2.
  4. From the time- and frequency-domain features extracted for each ACC sample, you can extract more "condensed" information, namely 1-D features (e.g. compute skewness from the FFT or variance from Signal Magnitude; there are a lot of degrees of freedom).
  5. Build the data numpy array and feed it into the clustering algorithm. The $M$ rows will correspond to your $M$ samples of ACC signals and your $N$ columns will be your $N$ 1D features.

Please let me know if you're looking for a specific use case or specific ACC features.