AmandaFranklinRyan / SupervisedMachineLearning

0 stars 0 forks source link

Over/Undersampling #1

Open AmandaFranklinRyan opened 10 months ago

AmandaFranklinRyan commented 10 months ago

From the performance of the random forest model, it looks to me like the model isn't working well for the most expensive properties. I was thinking maybe we could use over/undersampling to correct this. I will have a look at this and add tuning parameters to the random forest model too to see if that boosts performance. The R squared is currently around 0.75 I think.

AmandaFranklinRyan commented 10 months ago

I checked the data again and it seems the sample is balanced so no need to worry about this

AmandaFranklinRyan commented 10 months ago

This is the boxplot for the original rent data:

image

Perhaps I'm misinterpreting the boxplot, but I thought this meant the data wasn't imbalanced, but what should we do with the outliers?

For comparison, this is the histogram

image :

linanita22 commented 10 months ago

When do we talk about an unbalanced/balaned dataset?