facebookresearch / hiplot

HiPlot makes understanding high dimensional data easy
https://facebookresearch.github.io/hiplot/
MIT License
2.75k stars 143 forks source link

Auto-ranking of most explicative features #228

Open danthe3rd opened 2 years ago

danthe3rd commented 2 years ago

Scenario: I have a grid-search on parameters A, B and C. For each sample, I have an associated loss which I try to minimize.

I want to know which parameter (A, B or C) has the most influence on the loss automatically.

In python: This can be done by learning a simple RandomForestRegressor (or Classifier depending on the target value type), and then calling permutation_importance to get an importance score for each parameter. For this to be embedded in HiPlot, it would need to be done in JS (for example with this library?)

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html https://scikit-learn.org/stable/modules/permutation_importance.html

UI: This could be triggered by right-clicking a column. The result could be displayed by ordering the column by relative importance. Need a way to select which columns to include/exclude from the calculation, and to display the correlation score