ekstroem / dataReporter

85 stars 4 forks source link

Potential bug with the `identifyOutliers` check function #4

Closed nischalshrestha closed 2 years ago

nischalshrestha commented 2 years ago

I was exploring the dataReporter::identifyOutliers function on the airquality dataset and I noticed the package reported 1 as being the outlier value which doesn't seem to make sense:

require(dataReporter)
#> Loading required package: dataReporter

# the value of 1 is suggested as outlier
identifyOutliers(airquality$Ozone)
#> Note that the following possible outlier values were detected: 1.

# however, `boxplot.stats` reports the more reasonable
# values for the outliers
boxplot.stats(airquality$Ozone)$out
#> [1] 135 168

Created on 2022-02-17 by the reprex package (v1.0.0)

Could this be a bug perhaps? Please let me know if I'm not using the check correctly.

nischalshrestha commented 2 years ago

It looks like the identifyOutliersTBStyle is what I should use to align with the boxplot method:

require(dataReporter)
#> Loading required package: dataReporter
boxplot.stats(airquality$Ozone)$out
#> [1] 135 168
identifyOutliersTBStyle(airquality$Ozone)
#> Note that the following possible outlier values were detected: 135, 168.

Created on 2022-02-17 by the reprex package (v1.0.0)

I can close this if the identifyOutliers check is behaving as expected!

ekstroem commented 2 years ago

It is a "feature" in the sense that the default underlying outlier detection algorithm is based on code from the robustbase package which has an asymmetric outlier detection approach that is different from the traditional Tukey limits.

The identifyOutliersTBStyle provides outliers that matches what is seen in the boxplots graphs.

So it is not a bug and you have found the workaround that produces results that resemble what is seen elsewhere in base R.

And thanks for using the package and giving comments, feedback and questions through github. We love that!

nischalshrestha commented 2 years ago

Thank you for the explanation! That clears it up well. I will close this issue since it's not a bug.