Closed chuangw6 closed 3 years ago
I wrote this in the introduction for EDA: "The data set is clean as the missing values have been removed and the continuous values have been scaled. The features and target variable are all continuous, thus the exploratory data analysis will focus on the distribution of the features and the correlations between the features. "
Hey Guys! I’ve done some preliminary EDA. It is by no means complete. I need some help. I’m also not very well versed with Altair (I’m more comfortable with seaborn) Here are some of the issues that I’m facing:
For the latest version, I suggest to remove the following graphs:
Because "Pairplot of Numerical Columns" have contained most information. And we don't want to put redundant information in the final version.
For EDA of numerical features, this is what I'm used to plotting:
Histogram of individual features: To check if each of the individual features is normally distributed, or does it have some other distribution? Is it unimodal/ bimodal? Is there skewness of some sort?
Boxplot: to check presence of outliers + skewness
I sometimes replace 1 and 2 with a violin plot, since it combines the functionality of a density plot and a boxplot
If you guys feel that the histogram is unnecessary I'm okay with it too
I agree with Charles. We can put these plots in the EDA which give us different perspectives of the dataset.
Issue solved in milestone 1.
For the EDA process, please add the items that you find necessary to include in EDA.