Open bcjaeger opened 2 years ago
I would love to help with this!
FIrst analysis - observe how performance varies across leaf_min events. I used a 3 fold cross-validation with 10 repeats and a prediction horizon of 350 days.
This is the c-statistic for AIDS - this trend has been consistent
Death - fewer events. The performance has been rather inconsistent prior to when I set a seed. A line is fit, but there is really little trend (possible that leaf_min_events=4 is an outlier). The c-statistic, however, is not bad across the board.
Future plans: Amend code to make it more efficient with better output format Add Brier score vary split_num_obs Perhaps try Monte Carlo in addition to CV (increase number of repeats?) Maybe try varying the mtry down the line (probably not the n_retry) Adding curved line to performance graphs may be preferable in anticipation of some point of optimal performance
aorsf works very well on ACTG320 mortality prediction and it would be great to figure out why it works well
Plan:
Here is a little synopsis of what I would like to check with the actg320 data:
The data have two main endpoints, death and aids diagnosis. For both endpoints, I want to see how well aorsf performs with a number of different hyper-parameter values. In other words, I am guessing that the performance of aorsf on this dataset is going to depend on how well we tune it. The main tuning parameters for aorsf are below (copied from
?aorsf::orsf
). I think we could set up a simple experiment where we make a dataset with one column for each tuning parameter, with each row having a specific set of inputs for orsf(), and then we assess the performance of each set of parameter inputs using cross-validation, probably with just 3 folds b/c the count of events is low. This would be a great exercise and should also provide some useful info for us, i.e., we may change the default values of orsf() for datasets with smaller event counts.@kristinlenoir, would you like to help me with this?