Closed utterances-bot closed 3 years ago
"Comment written by Felix Kluxen on August 17, 2020 09:27:12:
Dear Antoine,
thank you for this helpful post.
Just my two cents: I think it sometimes makes sense to formally distinguish two classes of outliers: extreme values and mistakes. Extreme values are statistically and philosophically more interesting, because they are possible but unlikely responses -- such as in your height example. Hawkins considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism - which is another interesting take on this.
Cheers, Felix
Hawkins, D. M., 1980. Identification of outliers. Chapman and Hall, London ; New York."
"Comment written by Felix Kluxen on August 17, 2020 09:27:12:
Dear Antoine,
thank you for this helpful post.
Just my two cents: I think it sometimes makes sense to formally distinguish two classes of outliers: extreme values and mistakes. Extreme values are statistically and philosophically more interesting, because they are possible but unlikely responses -- such as in your height example. Hawkins considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism - which is another interesting take on this.
Cheers, Felix
Hawkins, D. M., 1980. Identification of outliers. Chapman and Hall, London ; New York."
Comment written by Antoine Soetewey on August 17, 2020 10:32:36:
Dear Felix,
Thanks for your comment, the article has been updated accordingly (see first and fourth paragraph of the introduction). Feel free to let me know if there is any inconsistency.
Regards,
Antoine
"Comment written by Felix Kluxen on August 17, 2020 09:27:12: Dear Antoine, thank you for this helpful post. Just my two cents: I think it sometimes makes sense to formally distinguish two classes of outliers: extreme values and mistakes. Extreme values are statistically and philosophically more interesting, because they are possible but unlikely responses -- such as in your height example. Hawkins considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism - which is another interesting take on this. Cheers, Felix Hawkins, D. M., 1980. Identification of outliers. Chapman and Hall, London ; New York."
Comment written by Antoine Soetewey on August 17, 2020 10:32:36:
Dear Felix,
Thanks for your comment, the article has been updated accordingly (see first and fourth paragraph of the introduction). Feel free to let me know if there is any inconsistency.
Regards, Antoine
Comment written by Felix Kluxen on August 17, 2020 11:30:30:
Excellent! The elephant in the room with statistically identified outliers (here values that are probably not mistakes) is obviously that you cannot solve the issue of what researchers should do with the information - as you write. This really depends on the research question, eg subsets, responder/non-responder etc, and usually involves a suprising amount of needed reflection on the researcher's side... or the willingness to think the model assumptions through. If a statistical test result relies on a single influential value this should caution the researcher to make overambitious claims.
Cheers, Felix
"Comment written by Felix Kluxen on August 17, 2020 09:27:12: Dear Antoine, thank you for this helpful post. Just my two cents: I think it sometimes makes sense to formally distinguish two classes of outliers: extreme values and mistakes. Extreme values are statistically and philosophically more interesting, because they are possible but unlikely responses -- such as in your height example. Hawkins considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism - which is another interesting take on this. Cheers, Felix Hawkins, D. M., 1980. Identification of outliers. Chapman and Hall, London ; New York."
Comment written by Antoine Soetewey on August 17, 2020 10:32:36: Dear Felix, Thanks for your comment, the article has been updated accordingly (see first and fourth paragraph of the introduction). Feel free to let me know if there is any inconsistency. Regards, Antoine
Comment written by Felix Kluxen on August 17, 2020 11:30:30:
Excellent! The elephant in the room with statistically identified outliers (here values that are probably not mistakes) is obviously that you cannot solve the issue of what researchers should do with the information - as you write. This really depends on the research question, eg subsets, responder/non-responder etc, and usually involves a suprising amount of needed reflection on the researcher's side... or the willingness to think the model assumptions through. If a statistical test result relies on a single influential value this should caution the researcher to make overambitious claims.
Cheers, Felix
Comment written by Antoine Soetewey on August 17, 2020 12:15:18:
You're totally right, outliers require thoughtful reflection and caution for many statistical analyses!
Dear Antoine This is very helpful indeed. I just found a key to detecting outliers formally for my project, thanks to this write up Many thanks Duncan
Dear Antoine This is very helpful indeed. I just found a key to detecting outliers formally for my project, thanks to this write up Many thanks Duncan
Glad you find it useful!
Hi Antoine Its been. Actually am looking for more on winsorizing outliers in R by replacing them rather than deleting them. Any guidance will be very helpful Kind regards
Comment written by vijayarajamanickam on December 03, 2020 12:26:17:
Dear Antonie,
I tried to detect outliers using this script
out_ind <- which(dat$hwy %in% c(out))
out_ind#### .
Most of them are working well, but in some cases it showing Integer(0).
Could you please help me in this?
Many thanks
vijay
Comment written by vijayarajamanickam on December 03, 2020 12:26:17:
Dear Antonie,
I tried to detect outliers using this script
out <- boxplot.stats(dat$hwy)$out
out_ind <- which(dat$hwy %in% c(out)) out_ind#### .
Most of them are working well, but in some cases it showing Integer(0). Could you please help me in this?
Many thanks vijay
Comment written by Antoine Soetewey on December 03, 2020 18:00:30:
Dear,
When you have the result:
integer(0)
it simply means that there is no outlier according to this method.
If you run boxplot(dat$hwy)
, you will see that there is no potential outliers as defined by this method.
Hope this helps.
Regards,
Antoine
Hi Antoine Its been. Actually am looking for more on winsorizing outliers in R by replacing them rather than deleting them. Any guidance will be very helpful Kind regards
If you do not want to simply remove outliers, you can indeed use "Winsorization" which is a technique to replace extreme data values with less extreme values.
See for instance the Winsorize() function in R, or this article.
Hope this helps.
Regards, Antoine
Antoine Many thanks. This is helpful
regards duncan
Outliers detection in R - Stats and R
Learn how to detect outliers in R thanks to descriptive statistics and via the Hampel filter, the Grubbs, the Dixon and the Rosner tests for outliers
https://statsandr.com/blog/outliers-detection-in-r/