Waikato / moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
http://moa.cms.waikato.ac.nz/
GNU General Public License v3.0
603 stars 352 forks source link

Standardisation filter #238

Closed pass-always closed 2 years ago

pass-always commented 2 years ago

This filter is to standardise instances in a stream. Fixed the previous algorithm to correct, add two more Welford's online and Two-pass. Z-SCORE is used to standardise the values of a normal distribution. For more information: https://en.wikipedia.org/wiki/Standard_score. The formula is: z=(z-μ)/σ μ is the mean of the population. σ is the standard deviation of the population, as the square root of variance. There are three algorithms for calculating variance. For more information: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Computing_shifted_data

  1. Naive algorithm
  2. Welford's online algorithm
  3. Two-pass algorithm You can check the results to compare the accuracy in the Outputs.xlsx. In the Output4 sheet, there are multiple colums caculated by different algorithms. Outputs.xlsx
hmgomes commented 2 years ago

Hi @pass-always

Please check your commits before creating the PR, this PR should not include changes to AdaptiveRandomForestRE. Also, the change to WriteStreamToARFFFile is relevant, but it should be included in a separate PR as well

Regards, Heitor