AmandaFranklinRyan / SupervisedMachineLearning

0 stars 0 forks source link

Histograms and variance of variables #3

Open AmandaFranklinRyan opened 11 months ago

AmandaFranklinRyan commented 11 months ago

Here is a histogram showing the variables with less than 90% missing values and 95% correlation:

All Histograms

This plot shows the features with the lowest variance:

Lowest Variance Bar

On this basis I think it makes sense to drop lat, lon, avg_bauperiode and month. What do you think?

linanita22 commented 11 months ago

I agree with you to drop lat, lon and month. But why do you think that avg_bauperiode has such a low variance? Looking at the plot above, it shows that we do have different values. Also, when looking at my data profiling file, we have 97.8% of the values. I therefore would include it in the model....