Open AmandaFranklinRyan opened 11 months ago
I agree with you to drop lat, lon and month. But why do you think that avg_bauperiode has such a low variance? Looking at the plot above, it shows that we do have different values. Also, when looking at my data profiling file, we have 97.8% of the values. I therefore would include it in the model....
Here is a histogram showing the variables with less than 90% missing values and 95% correlation:
This plot shows the features with the lowest variance:
On this basis I think it makes sense to drop lat, lon, avg_bauperiode and month. What do you think?