After refactoring data prep code or refreshing a data set, you may want to check the latest data against the original data to determine the extent to which your numeric and categorical features changed. This can be done with summary statistics, but that can be tedious especially for visualizing data drift from multiple data sets through time.
I propose a package to make the visual comparison of data sets simple and automated. A user can supply multiple data sets, and the package would provide functions to assist in the preparation of visual aids for studying data changes. Numeric data distributions can be compared using histograms and categorical data can be compared using bar plots.
After refactoring data prep code or refreshing a data set, you may want to check the latest data against the original data to determine the extent to which your numeric and categorical features changed. This can be done with summary statistics, but that can be tedious especially for visualizing data drift from multiple data sets through time.
I propose a package to make the visual comparison of data sets simple and automated. A user can supply multiple data sets, and the package would provide functions to assist in the preparation of visual aids for studying data changes. Numeric data distributions can be compared using histograms and categorical data can be compared using bar plots.
The package's main dependencies would include: