Closed janweinreich closed 1 year ago
OK with proper hyperpara testing i get that KRR improves the MAE but not the RMSE meaning outliers don't get improved but the MAE actually goes down
WITH KERNEL RIDGE
Best gamma: 233.57214690901213, best lambda: 1e-06, best score: 0.5599158927309894
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_kernel_ridge', 'gammas': array([1.00000000e-03, 2.06913808e-03, 4.28133240e-03, 8.85866790e-03, 1.83298071e-02, 3.79269019e-02, 7.84759970e-02, 1.62377674e-01, 3.35981829e-01, 6.95192796e-01, 1.43844989e+00, 2.97635144e+00, 6.15848211e+00, 1.27427499e+01, 2.63665090e+01, 5.45559478e+01, 1.12883789e+02, 2.33572147e+02, 4.83293024e+02, 1.00000000e+03]), 'lambdas': [1e-06], 'augment': 0, 'k': 10, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.5153881743915895, Valid MAE: 0.2343750025633457 , Test RMSE: 0.3922321306000044, Test MAE: 0.15384611262528342 Best gamma: 233.57214690901213, best lambda: 1e-06, best score: 0.5374046935341796
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_kernel_ridge', 'gammas': array([1.00000000e-03, 2.06913808e-03, 4.28133240e-03, 8.85866790e-03, 1.83298071e-02, 3.79269019e-02, 7.84759970e-02, 1.62377674e-01, 3.35981829e-01, 6.95192796e-01, 1.43844989e+00, 2.97635144e+00, 6.15848211e+00, 1.27427499e+01, 2.63665090e+01, 5.45559478e+01, 1.12883789e+02, 2.33572147e+02, 4.83293024e+02, 1.00000000e+03]), 'lambdas': [1e-06], 'augment': 0, 'k': 10, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.6123723624502961, Valid MAE: 0.3437504911012943 , Test RMSE: 0.4803842894154955, Test MAE: 0.23076915489395283 Best gamma: 233.57214690901213, best lambda: 1e-06, best score: 0.4986789990871282
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_kernel_ridge', 'gammas': array([1.00000000e-03, 2.06913808e-03, 4.28133240e-03, 8.85866790e-03, 1.83298071e-02, 3.79269019e-02, 7.84759970e-02, 1.62377674e-01, 3.35981829e-01, 6.95192796e-01, 1.43844989e+00, 2.97635144e+00, 6.15848211e+00, 1.27427499e+01, 2.63665090e+01, 5.45559478e+01, 1.12883789e+02, 2.33572147e+02, 4.83293024e+02, 1.00000000e+03]), 'lambdas': [1e-06], 'augment': 0, 'k': 10, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.4330120246963194, Valid MAE: 0.1874997292927722 , Test RMSE: 0.8412444017653523, Test MAE: 0.3384619536286607 Best gamma: 233.57214690901213, best lambda: 1e-06, best score: 0.46165073593466427
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_kernel_ridge', 'gammas': array([1.00000000e-03, 2.06913808e-03, 4.28133240e-03, 8.85866790e-03, 1.83298071e-02, 3.79269019e-02, 7.84759970e-02, 1.62377674e-01, 3.35981829e-01, 6.95192796e-01, 1.43844989e+00, 2.97635144e+00, 6.15848211e+00, 1.27427499e+01, 2.63665090e+01, 5.45559478e+01, 1.12883789e+02, 2.33572147e+02, 4.83293024e+02, 1.00000000e+03]), 'lambdas': [1e-06], 'augment': 0, 'k': 10, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.9270241149760475, Valid MAE: 0.3906277785628098 , Test RMSE: 0.5817739380528093, Test MAE: 0.24615353909749674
WITH Knn REGRESSION
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_knn', 'k': 10, 'augment': 0, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.5212664865498261, Valid MAE: 0.2953125 , Test RMSE: 0.7570184430229709, Test MAE: 0.42615384615384616
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_knn', 'k': 10, 'augment': 0, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.5604127942864974, Valid MAE: 0.38125 , Test RMSE: 0.4629337882545325, Test MAE: 0.3400000000000001
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_knn', 'k': 10, 'augment': 0, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.43892624665198593, Valid MAE: 0.2796875 , Test RMSE: 0.5688044141341219, Test MAE: 0.3553846153846154
freesolv (1 tasks) {'dataset': 'freesolv', 'splitter': 'random', 'task': 'regression_knn', 'k': 10, 'augment': 0, 'preprocess': False, 'sub_sample': 0.0, 'is_imbalanced': True, 'n': 4} Valid RMSE: 0.7024510659113559, Valid MAE: 0.42500000000000004 , Test RMSE: 0.5196152422706632, Test MAE: 0.358461538461538
Added hyperparameter optmization for kernel ridge regression
cross_val_and_fit_kernel_ridge
with extensive doc string. In particular user needs to suggest a range of kernel width parameters as well as regularization:and suggest renaming the
regression
function toregression_knn
to seperate this from the kernel ridge regression functionregression_kernel_ridge
.added new dictionary entry to the benchmark main() function too. As expected kernel ridge regression is very slow because of the pairwise computation of the kernel matrix. I will share the FreeSolv results as soon as this is done and compare with KNN !