MDAIceland / WaterSecurity

1 stars 1 forks source link

ML Model Implementation #31

Closed VasLem closed 3 years ago

VasLem commented 3 years ago

The next step to take after #10 . There is an idea of the dimensionality of features/labels. So that shall be enough to create a model prototype, based on the research. The model can be created by making use of pipelines and subclassing base classes, to package the model into an sklearn-like class.

OlympiaG commented 3 years ago

I tried to create a Multinomial Logistic Regression classifier and predicting the classes' probability (different risks). I am not sure at all if it is the correct way and if the code is efficient. So, @bajo1207 could you also check it and tell me your opinion? So if it's wrong or doesn't work, I will rewrite it or change it (I can send you the .py).

ekaan commented 3 years ago

@antosalerno do you need any assistance or you are just changing the parameters and see which one will work the best? I might try to help

antosalerno commented 3 years ago

Exactly, that's what I'm doing. I'm performing cross validation and storing in a file the best parameters.

VasLem commented 3 years ago

Could you please communicate the structure of the expected file, so that I can start working on the python file equivalent implementation? @antosalerno

antosalerno commented 3 years ago

Is it ok a dictionary saved as .pkl file?

VasLem commented 3 years ago

The following is the output from the bayesian optimization. I have added more commits. @antosalerno you can see the notebook XGBoost Fitting Using Bayesian Optimization.ipynb as a reference. I have run end to end the algorithm. I also added the elevation feature for each city, just noticed I could retrieve it from the geocities :D. You are welcome to pull and experiment.

Risk: Higher water prices

Samples Size: 87

iter target alpha colsam... gamma max_depth n_estimators
1 -1.029 10.69 0.6609 0.8468 6.822 474.6
2 -1.057 8.866 0.4458 0.9086 1.264 577.8
3 -1.02 12.46 0.3204 0.4169 3.166 934.4
4 -1.062 11.14 0.7036 0.2152 2.133 483.0
5 -1.099 3.302 0.7593 0.5758 4.68 407.2
6 -1.261 4.197 0.3899 0.6167 3.708 355.4
7 -1.543 0.7062 0.4681 0.9942 1.768 420.4
8 -1.065 16.42 0.544 0.1535 5.441 481.7
9 -1.306 4.866 0.5051 0.2079 5.595 869.3
10 -1.317 0.3069 0.7189 0.6719 4.606 952.0
11 -1.085 18.14 0.3 0.2972 2.536 926.1
12 -1.204 4.226 0.3 0.8667 1.504 926.5
13 -1.079 19.73 0.3395 0.4359 6.486 938.3
14 -1.073 18.56 0.9 0.0 1.0 468.6
15 -1.329 4.181 0.4546 0.6282 6.974 394.8
16 -1.091 6.004 0.3008 0.8962 2.36 466.8
17 -1.432 0.2857 0.9 0.0 7.0 478.9
18 -1.098 15.43 0.5263 0.7172 1.138 475.5
19 -1.084 15.68 0.3786 0.7065 6.862 462.9
20 -1.041 13.68 0.3 0.0 1.0 942.2

Risk: Inadequate or aging infrastructure Samples Size: 148

iter target alpha colsam... gamma max_depth n_estimators
1 -0.9487 3.223 0.4324 0.2213 6.649 901.8
2 -0.7952 7.654 0.6042 0.8987 2.619 887.6
3 -0.7991 13.49 0.7745 0.3991 1.687 890.5
4 -0.8813 6.473 0.4275 0.03099 4.635 856.4
5 -0.8075 16.57 0.7656 0.5099 2.545 412.5
6 -0.8715 5.347 0.6474 0.2598 4.645 650.2
7 -0.8213 17.52 0.4126 0.8959 1.846 286.9
8 -0.9036 2.187 0.3741 0.373 5.926 932.2
9 -0.9287 3.209 0.8861 0.2588 5.442 311.7
10 -0.8093 19.03 0.4964 0.03291 5.319 661.6
11 -0.7864 13.08 0.7076 1.0 1.0 882.6
12 -0.942 2.054 0.6671 0.5189 2.895 877.2
13 -0.8149 13.78 0.3 1.0 7.0 885.6
14 -0.806 18.86 0.7604 0.9814 2.055 884.3
15 -0.8846 5.936 0.5739 0.3046 4.683 410.1
16 -0.8096 18.49 0.615 0.27 3.388 421.8
17 -0.839 19.85 0.7831 0.1993 6.613 671.8
18 -0.8302 20.0 0.3 1.0 1.0 402.2
19 -0.8294 18.93 0.8319 0.1886 4.519 433.6
20 -0.8084 20.0 0.3129 0.1215 4.873 274.6

Risk: Increased water stress or scarcity Samples Size: 261

iter target alpha colsam... gamma max_depth n_estimators
1 -0.3626 2.766 0.4401 0.3459 5.671 613.4
2 -0.3388 7.764 0.3748 0.2113 4.651 944.9
3 -0.3506 2.365 0.3556 0.6195 6.547 955.9
4 -0.3364 6.016 0.8841 0.9525 2.226 460.3
5 -0.3388 17.84 0.3491 0.07574 5.649 953.1
6 -0.3588 18.97 0.877 0.3896 5.064 772.2
7 -0.3311 11.32 0.8818 0.6683 5.476 802.1
8 -0.3343 12.86 0.3972 0.07618 2.339 850.1
9 -0.3335 5.983 0.8199 0.8885 4.27 630.0
10 -0.3256 5.92 0.3834 0.8878 3.168 526.8
11 -0.3315 2.648 0.4765 0.7296 4.392 895.3
12 -0.3338 18.77 0.481 0.5995 1.645 803.7
13 -0.3317 13.77 0.3929 0.2846 3.817 525.9
14 -0.3744 1.682 0.6329 0.02555 6.661 517.7
15 -0.3286 9.322 0.4649 0.3631 3.073 530.0
16 -0.3459 2.841 0.5927 0.5091 1.658 532.3
17 -0.3276 9.349 0.4026 0.2734 6.366 528.0
18 -0.3298 12.92 0.8218 0.9314 4.781 809.7
19 -0.3396 5.678 0.6093 0.3203 1.765 807.0
20 -0.3583 19.56 0.8028 0.4251 6.82 813.8

Risk: Declining water quality Samples Size: 183

iter target alpha colsam... gamma max_depth n_estimators
1 -1.017 15.24 0.5021 0.4765 4.03 955.4
2 -1.146 3.951 0.3069 0.2203 3.108 673.4
3 -1.026 4.0 0.7201 0.9597 4.941 961.3
4 -1.141 9.27 0.7984 0.171 5.715 468.6
5 -0.9946 7.37 0.6244 0.01921 2.888 307.8
6 -1.134 6.55 0.5938 0.01857 6.075 554.2
7 -1.025 17.2 0.7216 0.841 5.813 325.2
8 -1.075 19.34 0.7808 0.1896 6.464 395.2
9 -0.9351 7.142 0.54 0.1324 1.457 764.5
10 -1.018 16.05 0.6447 0.4008 4.759 461.8
11 -0.9565 7.743 0.5571 0.3126 2.624 307.9
12 -0.8607 6.901 0.5264 0.9323 1.247 763.3
13 -0.8679 7.995 0.8548 0.9833 1.993 763.2
14 -0.9363 7.28 0.8611 0.5022 2.088 761.0
15 -1.057 5.228 0.6352 0.4284 3.399 763.0
16 -1.059 9.406 0.4173 0.1764 3.534 762.6
17 -0.9167 8.846 0.542 0.05328 1.301 309.6
18 -0.8616 9.363 0.6971 0.9884 1.129 763.9
19 -0.9114 11.23 0.8064 0.8312 2.234 308.9
20 -0.8564 7.754 0.4608 0.7676 1.014 762.1

Risk: Increased water demand Samples Size: 98

iter target alpha colsam... gamma max_depth n_estimators
1 -1.217 18.02 0.6812 0.5451 6.896 643.6
2 -1.229 19.69 0.4645 0.7135 5.75 721.9
3 -1.222 16.56 0.8659 0.4097 5.167 530.5
4 -1.209 19.16 0.6947 0.9955 5.455 651.1
5 -1.154 12.16 0.3491 0.1814 4.748 410.0
6 -1.193 19.48 0.8924 0.8026 1.26 248.8
7 -1.186 8.315 0.3632 0.9371 1.101 847.2
8 -1.222 11.47 0.671 0.6548 4.981 650.4
9 -1.175 15.08 0.5308 0.3679 6.299 555.1
10 -1.236 17.97 0.6862 0.1582 3.3 737.8
11 -1.183 11.06 0.5585 0.6782 5.373 409.1
12 -1.183 12.51 0.4233 0.1829 3.819 410.7
13 -1.183 11.49 0.5731 0.2041 5.964 410.1
14 -1.192 12.78 0.7163 0.9359 5.117 410.6
15 -1.172 10.66 0.3185 0.1697 3.579 409.3
16 -1.226 13.66 0.8328 0.2321 5.406 408.6
17 -1.203 11.45 0.5863 0.5618 4.004 410.6
18 -1.222 9.69 0.5864 0.08564 4.079 408.6
19 -1.178 8.321 0.3652 0.7539 1.638 847.6
20 -1.152 12.15 0.857 0.2001 3.915 410.0

Risk: Regulatory Samples Size: 65

iter target alpha colsam... gamma max_depth n_estimators
1 -0.6468 15.19 0.8833 0.1425 1.581 478.6
2 -0.6468 14.5 0.6182 0.6762 4.132 506.6
3 -0.7314 7.599 0.5093 0.5005 5.549 449.4
4 -0.6495 12.97 0.7692 0.7693 6.901 950.7
5 -0.6468 15.29 0.3974 0.7186 4.209 551.4
6 -0.7587 5.216 0.8815 0.8894 4.746 758.1
7 -0.6468 18.41 0.3315 0.4609 5.848 485.1
8 -0.6468 13.9 0.5407 0.595 3.143 348.1
9 -0.6468 16.1 0.4538 0.407 1.782 290.3
10 -0.8251 4.691 0.6606 0.373 5.543 788.6
11 -0.9881 0.0 0.9 0.9517 1.0 618.7
12 -0.8716 0.3495 0.4567 0.1852 4.343 213.2
13 -0.9698 2.446 0.8587 0.3136 3.82 999.7
14 -0.6468 12.5 0.664 0.9763 2.782 348.4
15 -0.6468 20.0 0.4687 1.0 7.0 904.0
16 -0.938 0.0 0.9 0.0 7.0 317.0
17 -0.6468 17.74 0.9 1.0 1.0 375.6
18 -1.197 0.0 0.9 0.0 1.0 924.7
19 -0.9668 0.0 0.3 1.0 1.0 492.3
20 -0.6739 13.74 0.3 0.0 7.0 362.6

Risk: Energy supply issues Samples Size: 59

iter target alpha colsam... gamma max_depth n_estimators
1 -0.6747 1.412 0.7192 0.7479 2.893 656.5
2 -0.6753 6.899 0.3909 0.7769 3.047 501.6
3 -0.7638 0.4693 0.4836 0.5069 5.302 690.3
4 -0.7093 1.515 0.7129 0.205 4.67 657.3
5 -0.6518 18.39 0.8179 0.3172 6.033 759.0
6 -0.6515 13.71 0.7813 0.5914 5.963 740.3
7 -0.6518 16.69 0.7496 0.4808 2.56 485.7
8 -0.6518 19.3 0.6752 0.9545 1.855 787.9
9 -0.6516 13.54 0.7421 0.4082 3.906 304.4
10 -0.6509 13.06 0.6228 0.8618 4.04 909.1
11 -0.6518 15.0 0.8898 0.3295 2.928 908.8
12 -0.6518 15.03 0.4379 0.7111 4.273 913.7
13 -0.6288 9.693 0.4822 0.1611 1.029 913.5
14 -0.6448 8.011 0.699 0.7634 4.769 915.0
15 -0.676 5.709 0.3277 0.8865 1.858 911.2
16 -0.6493 11.84 0.6053 0.4917 1.769 917.5
17 -0.6545 6.212 0.3351 0.541 1.303 918.5
18 -0.6515 13.68 0.5229 0.237 6.617 746.9
19 -0.6518 18.24 0.3344 0.6481 5.667 752.2
20 -0.6518 18.07 0.7913 0.7944 2.729 744.6
antosalerno commented 3 years ago

Great @VasLem!

VasLem commented 3 years ago

This is not the final version, I just plugged things in, if anyone of you had managed to get better results, please do your commits!!! Also, I cheated a litte there, I didn't use a test set, only the cv :fearful:

VasLem commented 3 years ago

@OlympiaG do we have any news about the Bagging option? Currently I report mse close to 1 for most risks, which sucks a little...