wide data - big problem

ModelOriented / live

Local Interpretable (Model-agnostic) Visual Explanations - model visualization for regression problems and tabular data based on LIME method. Available on CRAN

https://modeloriented.github.io/live/

Other

35 stars 5 forks source link

wide data - big problem #28

Closed pbiecek closed 6 years ago

pbiecek commented 7 years ago

In the TCGA use case we have like 20000 predictors, this causes 2 types of problems:

1) ranger nor randomForest are not working for this number of features, so I am calling them on subset of 10k features

2) for white box classifiers we need more samples in the surroundings than dimensions. So the default 50 is far not enough (otherwise we will fall in the p >> n problem for the white box). And 20k is too time-consuming.

mstaniak commented 7 years ago

we went for 1 fake obs = 1 variable changed which was fine for low p,

perhaps we need to vary the number of changed variables according to the p/n ratio by default and let the user decide how many variables should be changed for each simulated observation if it's not enough

this is more of a comment for the last commit (same label for all fake obs), no idea yet for the second problem, because regularization won't work with forest plot

pbiecek commented 7 years ago

Interesting, this requires some more advanced studies. Good for a grant proposal ;-)

mstaniak commented 6 years ago

closing, because no progress will be made in nearest future noted as an idea for grant proposal