Closed pbiecek closed 6 years ago
we went for 1 fake obs = 1 variable changed which was fine for low p,
perhaps we need to vary the number of changed variables according to the p/n ratio by default and let the user decide how many variables should be changed for each simulated observation if it's not enough
this is more of a comment for the last commit (same label for all fake obs), no idea yet for the second problem, because regularization won't work with forest plot
Interesting, this requires some more advanced studies. Good for a grant proposal ;-)
closing, because no progress will be made in nearest future noted as an idea for grant proposal
In the TCGA use case we have like 20000 predictors, this causes 2 types of problems:
1) ranger nor randomForest are not working for this number of features, so I am calling them on subset of 10k features
2) for white box classifiers we need more samples in the surroundings than dimensions. So the default 50 is far not enough (otherwise we will fall in the p >> n problem for the white box). And 20k is too time-consuming.