Open mikecyterski opened 2 years ago
The above plots are relevant for regression problems (continuous response variable). A different set of plots (and analytical techniques) must be generated for classification problems (categorical response variable). This type of analysis would seem to be a fairly important addition to the WebVB toolbox.
Initial Visualization Design:
Plots for All Features (pop-up window, accessed from some central location - maybe by right-clicking at the top left of the data table)
Missing data by column and row
Missing data by column
Missing data by row
Correlation matrix across all features
Dendrogram of feature clustering
Plots for Individual Features/Response (pop-up window accessed by right-click of the column header?)
Time-series of active column
Density plot: the number of observations within binned range of column values
Scatterplot of active column vs. response (with correlation included)
Box and whiskers plot for the response values within each level of active categorical column
Box and whiskers plot for values within active continuous column
Pie chart of number of observations per level of active categorical column
(For the plots that follow, user can toggle showing/hiding results for any of the estimators)
Fitted values versus actual response values across cv-folds and reps DEFAULT
Fitted values and actual values versus row number (after ordering by actual value) across cv-folds and reps OPTIONAL
Fitted values and actual values (i.e., a time series plot) across cv-folds and reps OPTIONAL Same as previous plot, only the x-axis is original row number, not row number ordered by response value
Model scores/metrics for each estimator as a boxplot DEFAULT
Model scores/metrics for each estimator as a lineplot OPTIONAL
For linear models, a table of feature coefficients and their significance OPTIONAL
For machine learning models, a table/plot of feature influence values OPTIONAL
For machine learning models, a partial dependence plot (PDP), showing how the response variable varies across the range of the chosen feature OPTIONAL
For chosen model, a new prediction, its prediction interval, and original training data results across cv-folds and reps: ACTUAL y is UNKNOWN
For chosen model, new prediction, its prediction interval, and original training data results across cv-folds and reps: ACTUAL y is KNOWN