Closed JoshVarty closed 5 years ago
Something similar to:
From:
# https://www.kaggle.com/artgor/is-this-malware-eda-fe-and-lgb-updated stats = [] for col in train.columns: stats.append((col, train[col].nunique(), train[col].isnull().sum() * 100 / train.shape[0], train[col].value_counts(normalize=True, dropna=False).values[0] * 100, train[col].dtype)) stats_df = pd.DataFrame(stats, columns=['Feature', 'Unique_values', 'Percentage of missing values', 'Percentage of values in the biggest category', 'type']) stats_df.sort_values('Percentage of missing values', ascending=False)
Something similar to:
From: