Closed DK96-OS closed 2 years ago
There are at least three key Outlier methods:
These methods require a definition and technique for determining what an outlier is. This may be specified by the number of Standard Deviations, or a confidence interval.
There may be an option for including/excluding the outlier in the calculations.
Edit: This third method is not important to the DeviationPolicy as one with basic knowledge of statistics can use the DistributionCharacteristics and easily write a line to check the last element.
It may be possible to use generics to combine all of the List types into one method signature. This would be ideal from a user perspective, however performance impact should be considered at some later date.
It is actually not practical to use generics. There are many obstacles and the set of reasonable workarounds have been exhausted.
Failed Tests:
NumberListType testing function runOnAllLists needs to be extracted to a public function providing object in the test sources directories.
A DeviationPolicy would generally be applied to a specific set of distributions, for which there are a set of expected values.
This set may represent a series of measurements, for which there is a physical (or digital) lower bound, a small expected range of values, and rare large values. The ideal outlier policy is to look for ouliers only much greater than the range of expected values, and ignore lower values, even if they are over 6 Standard Deviations (SD) below the mean.
Should DeviationPolicy maintain an instance of DistributionCharacteristics?
This branch needs to be merged soon. There are important project structure modifications to be made. Anything too time-consuming to resolve, will become a new issue.
The last thing to do before merging:
identifyOutliers
functions on double and long type arrays
Create static library functions for dealing with outliers.
Define an Outlier Policy that contains the options for outlier removal.
Enable the policy to be extensible so that different techniques for identifying outliers can be utilized.