ThrunGroup / FastForest

7 stars 0 forks source link

Reproduce experiments #235

Closed vxbrandon closed 2 years ago

vxbrandon commented 2 years ago

*Reproduce experiment results with ec2 instance (t2.2xlarge) Can reproduce by running bash repro_script.sh

==============================
Table 1 (Classification): MNIST
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      2149.006 ± 13.179                       1.44E+08 ± 4.85E+05                     0.777 ± 0.005
1st row                                 RF + MABSplit                           60.034 ± 0.413                          3.37E+06 ± 1.62E+04                     0.763 ± 0.008
2nd row                                 ExtraTrees                              2476.905 ± 4.571                        1.68E+08 ± 0.00E+00                     0.762 ± 0.003
2nd row                                 ExtraTrees + MABSplit                   74.171 ± 0.244                          4.32E+06 ± 7.69E+03                     0.755 ± 0.002
3rd row                                 RP                                      1980.633 ± 9.167                        1.32E+08 ± 6.95E+05                     0.771 ± 0.003
3rd row                                 RP + MABSplit                           56.512 ± 0.358                          3.17E+06 ± 1.40E+04                     0.768 ± 0.003
==============================
Table 1 (Classification): APS
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      29.559 ± 0.117                          3.77E+06 ± 9.66E+03                     0.985 ± 0.0
1st row                                 RF + MABSplit                           0.791 ± 0.005                           6.94E+04 ± 2.19E+02                     0.985 ± 0.0
2nd row                                 ExtraTrees                              27.013 ± 0.079                          3.78E+06 ± 0.00E+00                     0.985 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   0.695 ± 0.003                           7.00E+04 ± 0.00E+00                     0.985 ± 0.0
3rd row                                 RP                                      25.574 ± 0.116                          3.22E+06 ± 1.18E+04                     0.985 ± 0.0
3rd row                                 RP + MABSplit                           0.702 ± 0.006                           5.96E+04 ± 2.19E+02                     0.985 ± 0.0
==============================
Table 1 (Classification): FLIGHT
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      87.382 ± 0.448                          1.16E+07 ± 3.01E+04                     0.815 ± 0.0
1st row                                 RF + MABSplit                           1.534 ± 0.009                           1.29E+05 ± 4.38E+02                     0.815 ± 0.0
2nd row                                 ExtraTrees                              87.281 ± 0.082                          1.17E+07 ± 0.00E+00                     0.815 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   1.391 ± 0.004                           1.30E+05 ± 0.00E+00                     0.815 ± 0.0
3rd row                                 RP                                      79.618 ± 0.245                          1.06E+07 ± 3.01E+04                     0.815 ± 0.0
3rd row                                 RP + MABSplit                           1.429 ± 0.018                           1.18E+05 ± 3.35E+02                     0.815 ± 0.0
==============================
Table 1 (Classification): COVTYPE
                                        Model                                   Time (s)                                Number of Insertions                    Accuracy
1st row                                 RF                                      167.836 ± 0.388                         1.86E+07 ± 0.00E+00                     0.559 ± 0.028
1st row                                 RF + MABSplit                           1.564 ± 0.012                           3.98E+04 ± 1.79E+02                     0.505 ± 0.004
2nd row                                 ExtraTrees                              167.118 ± 0.26                          1.86E+07 ± 0.00E+00                     0.539 ± 0.022
2nd row                                 ExtraTrees + MABSplit                   1.881 ± 0.409                           1.06E+05 ± 3.88E+04                     0.5 ± 0.005
3rd row                                 RP                                      145.539 ± 0.866                         1.62E+07 ± 8.31E+04                     0.51 ± 0.008
3rd row                                 RP + MABSplit                           1.418 ± 0.018                           3.50E+04 ± 0.00E+00                     0.507 ± 0.005

==============================
Table 2 (Regression): SKLEARN_REGRESSION
                                        Model                                   Time (s)                                Number of Insertions                    MSE
1st row                                 RF                                      6.698 ± 0.145                           4.00E+07 ± 0.00E+00                     5524.814 ± 28.441
1st row                                 RF + MABSplit                           2.714 ± 0.287                           2.50E+05 ± 0.00E+00                     5524.814 ± 28.441
2nd row                                 ExtraTrees                              6.896 ± 0.024                           4.00E+07 ± 0.00E+00                     5087.722 ± 18.239
2nd row                                 ExtraTrees + MABSplit                   1.82 ± 0.031                            2.51E+05 ± 7.16E+02                     5097.103 ± 30.911
3rd row                                 RP                                      5.485 ± 0.039                           3.36E+07 ± 0.00E+00                     5399.368 ± 60.309
3rd row                                 RP + MABSplit                           1.977 ± 0.058                           2.14E+05 ± 3.51E+03                     5399.368 ± 60.309
==============================
Table 2 (Regression): AIR
                                        Model                                   Time (s)                                Number of Insertions                    MSE
1st row                                 RF                                      42.441 ± 0.756                          2.59E+08 ± 4.47E+05                     2130.908 ± 1.637
1st row                                 RF + MABSplit                           20.49 ± 0.798                           6.85E+05 ± 1.22E+03                     2021.828 ± 26.928
2nd row                                 ExtraTrees                              40.015 ± 0.147                          2.67E+08 ± 0.00E+00                     1929.588 ± 22.164
2nd row                                 ExtraTrees + MABSplit                   15.945 ± 0.076                          7.18E+05 ± 6.11E+03                     1917.062 ± 14.48
3rd row                                 RP                                      33.075 ± 0.148                          2.17E+08 ± 3.71E+05                     2083.698 ± 22.226
3rd row                                 RP + MABSplit                           14.496 ± 0.05                           5.83E+05 ± 3.85E+03                     2069.932 ± 27.657
==============================
Table 2 (Regression): GPU
                                        Model                                   Time (s)                                Number of Insertions                    MSE
1st row                                 RF                                      39.572 ± 0.145                          1.78E+08 ± 2.30E+04                     69733.002 ± 57.401
1st row                                 RF + MABSplit                           25.599 ± 0.288                          4.30E+06 ± 4.03E+04                     69493.921 ± 73.133
2nd row                                 ExtraTrees                              40.529 ± 0.21                           1.80E+08 ± 0.00E+00                     69734.948 ± 54.876
2nd row                                 ExtraTrees + MABSplit                   25.225 ± 0.274                          4.63E+06 ± 4.07E+04                     69585.029 ± 80.281
3rd row                                 RP                                      31.502 ± 0.208                          1.50E+08 ± 5.55E+04                     66364.998 ± 894.568
3rd row                                 RP + MABSplit                           27.492 ± 1.571                          5.23E+06 ± 3.92E+05                     66310.138 ± 896.237

==============================
Table 3 (Classification): MNIST
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      0.2 ± 0.179                             0.143 ± 0.026
1st row                                 RF + MABSplit                           15.8 ± 0.179                            0.83 ± 0.002
2nd row                                 ExtraTrees                              0.2 ± 0.179                             0.144 ± 0.027
2nd row                                 ExtraTrees + MABSplit                   12.0 ± 0.0                              0.814 ± 0.001
3rd row                                 RP                                      1.0 ± 0.0                               0.253 ± 0.003
3rd row                                 RP + MABSplit                           16.8 ± 0.179                            0.832 ± 0.002
==============================
Table 3 (Classification): APS
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      1.0 ± 0.0                               0.985 ± 0.0
1st row                                 RF + MABSplit                           5.8 ± 0.179                             0.989 ± 0.0
2nd row                                 ExtraTrees                              1.0 ± 0.0                               0.985 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   5.6 ± 0.219                             0.989 ± 0.0
3rd row                                 RP                                      1.0 ± 0.0                               0.985 ± 0.0
3rd row                                 RP + MABSplit                           6.8 ± 0.179                             0.989 ± 0.0
==============================
Table 3 (Classification): FLIGHT
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      0.2 ± 0.179                             0.815 ± 0.0
1st row                                 RF + MABSplit                           14.6 ± 0.219                            0.815 ± 0.0
2nd row                                 ExtraTrees                              0.0 ± 0.0                               0.815 ± 0.0
2nd row                                 ExtraTrees + MABSplit                   9.6 ± 0.219                             0.815 ± 0.0
3rd row                                 RP                                      0.0 ± 0.0                               0.815 ± 0.0
3rd row                                 RP + MABSplit                           16.2 ± 0.593                            0.815 ± 0.0
==============================
Table 3 (Classification): COVTYPE
                                        Model                                   Number of Trees                         Accuracy
1st row                                 RF                                      0.4 ± 0.219                             0.514 ± 0.019
1st row                                 RF + MABSplit                           99.8 ± 0.179                            0.675 ± 0.002
2nd row                                 ExtraTrees                              0.4 ± 0.219                             0.501 ± 0.007
2nd row                                 ExtraTrees + MABSplit                   29.6 ± 1.824                            0.676 ± 0.002
3rd row                                 RP                                      0.6 ± 0.219                             0.534 ± 0.03
3rd row                                 RP + MABSplit                           100.0 ± 0.0                             0.675 ± 0.002

==============================
Table 4 (Regression):SKLEARN_REGRESSION
                                        Model                                   Number of Trees                         Test MSE
1st row                                 RF                                      1.0 ± 0.0                               2479.698 ± 52.644
1st row                                 RF + MABSplit                           18.0 ± 0.0                              729.302 ± 13.139
2nd row                                 RP                                      1.0 ± 0.0                               2140.669 ± 260.937
2nd row                                 RP + MABSplit                           9.8 ± 0.716                             1005.343 ± 89.86
3rd row                                 ExtraTrees                              0.6 ± 0.219                             5677.874 ± 1611.928
3rd row                                 ExtraTrees + MABSplit                   18.0 ± 0.0                              689.331 ± 5.093
==============================
Table 4 (Regression):AIR
                                        Model                                   Number of Trees                         Test MSE
1st row                                 RF                                      0.0 ± 0.0                               3208.93 ± 0.0
1st row                                 RF + MABSplit                           14.0 ± 0.0                              886.386 ± 4.21
2nd row                                 RP                                      0.0 ± 0.0                               3208.93 ± 0.0
2nd row                                 RP + MABSplit                           12.4 ± 0.358                            863.118 ± 5.501
3rd row                                 ExtraTrees                              0.0 ± 0.0                               3208.93 ± 0.0
3rd row                                 ExtraTrees + MABSplit                   10.4 ± 0.219                            834.439 ± 4.363

==============================
Table 5 Stability Model (Budget: Q * 100000)
                                        Importance Model                        Dataset                                 Stability
1st row                                 HRFC+MID                                Random Classification                   0.536 ± 0.039
1st row                                 HRFC+MID + MABSplit                     Random Classification                   0.863 ± 0.016
2nd row                                 HRFR+MID                                Random Regression                       0.134 ± 0.021
2nd row                                 HRFR+MID + MABSplit                     Random Regression                       0.674 ± 0.043
3rd row                                 HRFC+Perm                               Random Classification                   0.579 ± 0.023
3rd row                                 HRFC+Perm + MABSplit                    Random Classification                   0.69 ± 0.023
4th row                                 HRFR+Perm                               Random Regression                       0.116 ± 0.017
4th row                                 HRFR+Perm + MABSplit                    Random Regression                       0.437 ± 0.044

==============================
Table 6 Compare our model vs sklearn
                                        Model                                   Task and Dataset                        Performance Metric                      Test Performance
1st row                                 RFC(Sklearn)                            Classification: 20 Newsgroups           Accuracy                                0.869 ± 0.01
1st row                                 RFC(Ours)                               Classification: 20 Newsgroups           Accuracy                                0.866 ± 0.015
2nd row                                 ERFC(Sklearn)                           Classification: 20 Newsgroups           Accuracy                                0.758 ± 0.038
2nd row                                 ERFC(Ours)                              Classification: 20 Newsgroups           Accuracy                                0.761 ± 0.04
3rd row                                 RFR(Sklearn)                            Regression: California Housing          MSE                                     0.322 ± 0.009
3rd row                                 RFR(Ours)                               Regression: California Housing          MSE                                     0.324 ± 0.008
4th row                                 ERFR(Sklearn)                           Regression: California Housing          MSE                                     0.612 ± 0.023
4th row                                 ERFR(Ours)                              Regression: California Housing          MSE                                     0.615 ± 0.023

Figure_1 Figure_2

What are fixed:

  1. When selecting features, randomly choose features that have the identical feature importance scores
  2. Change the hyperparameters of the feature selection stability experiment(table 5) and budget experiments(table 2 and table 4) to strengthen the experimental results. The original codes in the dataset_experiments branch failed to show the satisfying experiment result for the table 5.
  3. Uncomment the comments that block one line reproduce script.