UBC-MDS / forest-fire-area-prediction

This project aims to predict the burned area of forest fires in the northeast region of Portugal, using meteorological and soil moisture data.
https://ubc-mds.github.io/forest-fire-area-prediction/reports/forest_fire_analysis_report.html
MIT License
9 stars 10 forks source link

Outlier Detection to Improve our Results Scores #36

Closed Anahita97 closed 2 years ago

Anahita97 commented 2 years ago

I believe it is important to detect and take care of the outliers, and we can see whether we are getting better scores or not. I worked on the outliers and used a statistics method called cooked distance to identify the outliers. In total I got 10 outliers. I have inserted a screenshot below, and so basically anything above the red line (which is 4 / number of obs) will be identified as outliers.

Screen Shot 2021-11-25 at 1 48 40 PM
Anahita97 commented 2 years ago

Since the rain variable has mostly values of 0, we decided to remove rain first, and then apply the Cook's distance outlier method. We ended up getting 4 outliers after removing rain.