MattJBritton / ForestfortheTrees

Interactive visualization of ensemble ML algorithms (e.g. Gradient Boosting Classifiers) for explainable ML.
GNU General Public License v3.0
0 stars 0 forks source link

Represent categorical features in scatterplot as sized bubbles #2

Open MattJBritton opened 5 years ago

MattJBritton commented 5 years ago

Currently, the datapoint scatterplot in the components view does not represent categorical features well. They overplot and map onto the axes or spots between the heatmap squares. Fix this by:

A) aggregate data cases by category and calculate a sum, and B) adjust their location

This requires the pre-calculation of new data subsets before the call to explain() (since these distributions will not change for a given dataset and granularity), so maybe perform this calculation in build_base_model()?

MattJBritton commented 5 years ago

Actually think a better method is to display all points but just center them correctly in the square, and add some random jitter so they all appear. The reason for this is that if points are merged, then we can't encode any single-point-related values into the merged point. In particular, it would be impossible for play_components() to size and color points by the change in their prediction for the latest iteration.