Closed VasLem closed 3 years ago
I tried to create a Multinomial Logistic Regression classifier and predicting the classes' probability (different risks). I am not sure at all if it is the correct way and if the code is efficient. So, @bajo1207 could you also check it and tell me your opinion? So if it's wrong or doesn't work, I will rewrite it or change it (I can send you the .py).
@antosalerno do you need any assistance or you are just changing the parameters and see which one will work the best? I might try to help
Exactly, that's what I'm doing. I'm performing cross validation and storing in a file the best parameters.
Could you please communicate the structure of the expected file, so that I can start working on the python file equivalent implementation? @antosalerno
Is it ok a dictionary saved as .pkl file?
The following is the output from the bayesian optimization. I have added more commits. @antosalerno you can see the notebook XGBoost Fitting Using Bayesian Optimization.ipynb as a reference. I have run end to end the algorithm. I also added the elevation feature for each city, just noticed I could retrieve it from the geocities :D. You are welcome to pull and experiment.
Risk: Higher water prices
Samples Size: 87
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -1.029 | 10.69 | 0.6609 | 0.8468 | 6.822 | 474.6 |
2 | -1.057 | 8.866 | 0.4458 | 0.9086 | 1.264 | 577.8 |
3 | -1.02 | 12.46 | 0.3204 | 0.4169 | 3.166 | 934.4 |
4 | -1.062 | 11.14 | 0.7036 | 0.2152 | 2.133 | 483.0 |
5 | -1.099 | 3.302 | 0.7593 | 0.5758 | 4.68 | 407.2 |
6 | -1.261 | 4.197 | 0.3899 | 0.6167 | 3.708 | 355.4 |
7 | -1.543 | 0.7062 | 0.4681 | 0.9942 | 1.768 | 420.4 |
8 | -1.065 | 16.42 | 0.544 | 0.1535 | 5.441 | 481.7 |
9 | -1.306 | 4.866 | 0.5051 | 0.2079 | 5.595 | 869.3 |
10 | -1.317 | 0.3069 | 0.7189 | 0.6719 | 4.606 | 952.0 |
11 | -1.085 | 18.14 | 0.3 | 0.2972 | 2.536 | 926.1 |
12 | -1.204 | 4.226 | 0.3 | 0.8667 | 1.504 | 926.5 |
13 | -1.079 | 19.73 | 0.3395 | 0.4359 | 6.486 | 938.3 |
14 | -1.073 | 18.56 | 0.9 | 0.0 | 1.0 | 468.6 |
15 | -1.329 | 4.181 | 0.4546 | 0.6282 | 6.974 | 394.8 |
16 | -1.091 | 6.004 | 0.3008 | 0.8962 | 2.36 | 466.8 |
17 | -1.432 | 0.2857 | 0.9 | 0.0 | 7.0 | 478.9 |
18 | -1.098 | 15.43 | 0.5263 | 0.7172 | 1.138 | 475.5 |
19 | -1.084 | 15.68 | 0.3786 | 0.7065 | 6.862 | 462.9 |
20 | -1.041 | 13.68 | 0.3 | 0.0 | 1.0 | 942.2 |
Risk: Inadequate or aging infrastructure Samples Size: 148
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -0.9487 | 3.223 | 0.4324 | 0.2213 | 6.649 | 901.8 |
2 | -0.7952 | 7.654 | 0.6042 | 0.8987 | 2.619 | 887.6 |
3 | -0.7991 | 13.49 | 0.7745 | 0.3991 | 1.687 | 890.5 |
4 | -0.8813 | 6.473 | 0.4275 | 0.03099 | 4.635 | 856.4 |
5 | -0.8075 | 16.57 | 0.7656 | 0.5099 | 2.545 | 412.5 |
6 | -0.8715 | 5.347 | 0.6474 | 0.2598 | 4.645 | 650.2 |
7 | -0.8213 | 17.52 | 0.4126 | 0.8959 | 1.846 | 286.9 |
8 | -0.9036 | 2.187 | 0.3741 | 0.373 | 5.926 | 932.2 |
9 | -0.9287 | 3.209 | 0.8861 | 0.2588 | 5.442 | 311.7 |
10 | -0.8093 | 19.03 | 0.4964 | 0.03291 | 5.319 | 661.6 |
11 | -0.7864 | 13.08 | 0.7076 | 1.0 | 1.0 | 882.6 |
12 | -0.942 | 2.054 | 0.6671 | 0.5189 | 2.895 | 877.2 |
13 | -0.8149 | 13.78 | 0.3 | 1.0 | 7.0 | 885.6 |
14 | -0.806 | 18.86 | 0.7604 | 0.9814 | 2.055 | 884.3 |
15 | -0.8846 | 5.936 | 0.5739 | 0.3046 | 4.683 | 410.1 |
16 | -0.8096 | 18.49 | 0.615 | 0.27 | 3.388 | 421.8 |
17 | -0.839 | 19.85 | 0.7831 | 0.1993 | 6.613 | 671.8 |
18 | -0.8302 | 20.0 | 0.3 | 1.0 | 1.0 | 402.2 |
19 | -0.8294 | 18.93 | 0.8319 | 0.1886 | 4.519 | 433.6 |
20 | -0.8084 | 20.0 | 0.3129 | 0.1215 | 4.873 | 274.6 |
Risk: Increased water stress or scarcity Samples Size: 261
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -0.3626 | 2.766 | 0.4401 | 0.3459 | 5.671 | 613.4 |
2 | -0.3388 | 7.764 | 0.3748 | 0.2113 | 4.651 | 944.9 |
3 | -0.3506 | 2.365 | 0.3556 | 0.6195 | 6.547 | 955.9 |
4 | -0.3364 | 6.016 | 0.8841 | 0.9525 | 2.226 | 460.3 |
5 | -0.3388 | 17.84 | 0.3491 | 0.07574 | 5.649 | 953.1 |
6 | -0.3588 | 18.97 | 0.877 | 0.3896 | 5.064 | 772.2 |
7 | -0.3311 | 11.32 | 0.8818 | 0.6683 | 5.476 | 802.1 |
8 | -0.3343 | 12.86 | 0.3972 | 0.07618 | 2.339 | 850.1 |
9 | -0.3335 | 5.983 | 0.8199 | 0.8885 | 4.27 | 630.0 |
10 | -0.3256 | 5.92 | 0.3834 | 0.8878 | 3.168 | 526.8 |
11 | -0.3315 | 2.648 | 0.4765 | 0.7296 | 4.392 | 895.3 |
12 | -0.3338 | 18.77 | 0.481 | 0.5995 | 1.645 | 803.7 |
13 | -0.3317 | 13.77 | 0.3929 | 0.2846 | 3.817 | 525.9 |
14 | -0.3744 | 1.682 | 0.6329 | 0.02555 | 6.661 | 517.7 |
15 | -0.3286 | 9.322 | 0.4649 | 0.3631 | 3.073 | 530.0 |
16 | -0.3459 | 2.841 | 0.5927 | 0.5091 | 1.658 | 532.3 |
17 | -0.3276 | 9.349 | 0.4026 | 0.2734 | 6.366 | 528.0 |
18 | -0.3298 | 12.92 | 0.8218 | 0.9314 | 4.781 | 809.7 |
19 | -0.3396 | 5.678 | 0.6093 | 0.3203 | 1.765 | 807.0 |
20 | -0.3583 | 19.56 | 0.8028 | 0.4251 | 6.82 | 813.8 |
Risk: Declining water quality Samples Size: 183
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -1.017 | 15.24 | 0.5021 | 0.4765 | 4.03 | 955.4 |
2 | -1.146 | 3.951 | 0.3069 | 0.2203 | 3.108 | 673.4 |
3 | -1.026 | 4.0 | 0.7201 | 0.9597 | 4.941 | 961.3 |
4 | -1.141 | 9.27 | 0.7984 | 0.171 | 5.715 | 468.6 |
5 | -0.9946 | 7.37 | 0.6244 | 0.01921 | 2.888 | 307.8 |
6 | -1.134 | 6.55 | 0.5938 | 0.01857 | 6.075 | 554.2 |
7 | -1.025 | 17.2 | 0.7216 | 0.841 | 5.813 | 325.2 |
8 | -1.075 | 19.34 | 0.7808 | 0.1896 | 6.464 | 395.2 |
9 | -0.9351 | 7.142 | 0.54 | 0.1324 | 1.457 | 764.5 |
10 | -1.018 | 16.05 | 0.6447 | 0.4008 | 4.759 | 461.8 |
11 | -0.9565 | 7.743 | 0.5571 | 0.3126 | 2.624 | 307.9 |
12 | -0.8607 | 6.901 | 0.5264 | 0.9323 | 1.247 | 763.3 |
13 | -0.8679 | 7.995 | 0.8548 | 0.9833 | 1.993 | 763.2 |
14 | -0.9363 | 7.28 | 0.8611 | 0.5022 | 2.088 | 761.0 |
15 | -1.057 | 5.228 | 0.6352 | 0.4284 | 3.399 | 763.0 |
16 | -1.059 | 9.406 | 0.4173 | 0.1764 | 3.534 | 762.6 |
17 | -0.9167 | 8.846 | 0.542 | 0.05328 | 1.301 | 309.6 |
18 | -0.8616 | 9.363 | 0.6971 | 0.9884 | 1.129 | 763.9 |
19 | -0.9114 | 11.23 | 0.8064 | 0.8312 | 2.234 | 308.9 |
20 | -0.8564 | 7.754 | 0.4608 | 0.7676 | 1.014 | 762.1 |
Risk: Increased water demand Samples Size: 98
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -1.217 | 18.02 | 0.6812 | 0.5451 | 6.896 | 643.6 |
2 | -1.229 | 19.69 | 0.4645 | 0.7135 | 5.75 | 721.9 |
3 | -1.222 | 16.56 | 0.8659 | 0.4097 | 5.167 | 530.5 |
4 | -1.209 | 19.16 | 0.6947 | 0.9955 | 5.455 | 651.1 |
5 | -1.154 | 12.16 | 0.3491 | 0.1814 | 4.748 | 410.0 |
6 | -1.193 | 19.48 | 0.8924 | 0.8026 | 1.26 | 248.8 |
7 | -1.186 | 8.315 | 0.3632 | 0.9371 | 1.101 | 847.2 |
8 | -1.222 | 11.47 | 0.671 | 0.6548 | 4.981 | 650.4 |
9 | -1.175 | 15.08 | 0.5308 | 0.3679 | 6.299 | 555.1 |
10 | -1.236 | 17.97 | 0.6862 | 0.1582 | 3.3 | 737.8 |
11 | -1.183 | 11.06 | 0.5585 | 0.6782 | 5.373 | 409.1 |
12 | -1.183 | 12.51 | 0.4233 | 0.1829 | 3.819 | 410.7 |
13 | -1.183 | 11.49 | 0.5731 | 0.2041 | 5.964 | 410.1 |
14 | -1.192 | 12.78 | 0.7163 | 0.9359 | 5.117 | 410.6 |
15 | -1.172 | 10.66 | 0.3185 | 0.1697 | 3.579 | 409.3 |
16 | -1.226 | 13.66 | 0.8328 | 0.2321 | 5.406 | 408.6 |
17 | -1.203 | 11.45 | 0.5863 | 0.5618 | 4.004 | 410.6 |
18 | -1.222 | 9.69 | 0.5864 | 0.08564 | 4.079 | 408.6 |
19 | -1.178 | 8.321 | 0.3652 | 0.7539 | 1.638 | 847.6 |
20 | -1.152 | 12.15 | 0.857 | 0.2001 | 3.915 | 410.0 |
Risk: Regulatory Samples Size: 65
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -0.6468 | 15.19 | 0.8833 | 0.1425 | 1.581 | 478.6 |
2 | -0.6468 | 14.5 | 0.6182 | 0.6762 | 4.132 | 506.6 |
3 | -0.7314 | 7.599 | 0.5093 | 0.5005 | 5.549 | 449.4 |
4 | -0.6495 | 12.97 | 0.7692 | 0.7693 | 6.901 | 950.7 |
5 | -0.6468 | 15.29 | 0.3974 | 0.7186 | 4.209 | 551.4 |
6 | -0.7587 | 5.216 | 0.8815 | 0.8894 | 4.746 | 758.1 |
7 | -0.6468 | 18.41 | 0.3315 | 0.4609 | 5.848 | 485.1 |
8 | -0.6468 | 13.9 | 0.5407 | 0.595 | 3.143 | 348.1 |
9 | -0.6468 | 16.1 | 0.4538 | 0.407 | 1.782 | 290.3 |
10 | -0.8251 | 4.691 | 0.6606 | 0.373 | 5.543 | 788.6 |
11 | -0.9881 | 0.0 | 0.9 | 0.9517 | 1.0 | 618.7 |
12 | -0.8716 | 0.3495 | 0.4567 | 0.1852 | 4.343 | 213.2 |
13 | -0.9698 | 2.446 | 0.8587 | 0.3136 | 3.82 | 999.7 |
14 | -0.6468 | 12.5 | 0.664 | 0.9763 | 2.782 | 348.4 |
15 | -0.6468 | 20.0 | 0.4687 | 1.0 | 7.0 | 904.0 |
16 | -0.938 | 0.0 | 0.9 | 0.0 | 7.0 | 317.0 |
17 | -0.6468 | 17.74 | 0.9 | 1.0 | 1.0 | 375.6 |
18 | -1.197 | 0.0 | 0.9 | 0.0 | 1.0 | 924.7 |
19 | -0.9668 | 0.0 | 0.3 | 1.0 | 1.0 | 492.3 |
20 | -0.6739 | 13.74 | 0.3 | 0.0 | 7.0 | 362.6 |
Risk: Energy supply issues Samples Size: 59
iter | target | alpha | colsam... | gamma | max_depth | n_estimators |
---|---|---|---|---|---|---|
1 | -0.6747 | 1.412 | 0.7192 | 0.7479 | 2.893 | 656.5 |
2 | -0.6753 | 6.899 | 0.3909 | 0.7769 | 3.047 | 501.6 |
3 | -0.7638 | 0.4693 | 0.4836 | 0.5069 | 5.302 | 690.3 |
4 | -0.7093 | 1.515 | 0.7129 | 0.205 | 4.67 | 657.3 |
5 | -0.6518 | 18.39 | 0.8179 | 0.3172 | 6.033 | 759.0 |
6 | -0.6515 | 13.71 | 0.7813 | 0.5914 | 5.963 | 740.3 |
7 | -0.6518 | 16.69 | 0.7496 | 0.4808 | 2.56 | 485.7 |
8 | -0.6518 | 19.3 | 0.6752 | 0.9545 | 1.855 | 787.9 |
9 | -0.6516 | 13.54 | 0.7421 | 0.4082 | 3.906 | 304.4 |
10 | -0.6509 | 13.06 | 0.6228 | 0.8618 | 4.04 | 909.1 |
11 | -0.6518 | 15.0 | 0.8898 | 0.3295 | 2.928 | 908.8 |
12 | -0.6518 | 15.03 | 0.4379 | 0.7111 | 4.273 | 913.7 |
13 | -0.6288 | 9.693 | 0.4822 | 0.1611 | 1.029 | 913.5 |
14 | -0.6448 | 8.011 | 0.699 | 0.7634 | 4.769 | 915.0 |
15 | -0.676 | 5.709 | 0.3277 | 0.8865 | 1.858 | 911.2 |
16 | -0.6493 | 11.84 | 0.6053 | 0.4917 | 1.769 | 917.5 |
17 | -0.6545 | 6.212 | 0.3351 | 0.541 | 1.303 | 918.5 |
18 | -0.6515 | 13.68 | 0.5229 | 0.237 | 6.617 | 746.9 |
19 | -0.6518 | 18.24 | 0.3344 | 0.6481 | 5.667 | 752.2 |
20 | -0.6518 | 18.07 | 0.7913 | 0.7944 | 2.729 | 744.6 |
Great @VasLem!
This is not the final version, I just plugged things in, if anyone of you had managed to get better results, please do your commits!!! Also, I cheated a litte there, I didn't use a test set, only the cv :fearful:
@OlympiaG do we have any news about the Bagging option? Currently I report mse close to 1 for most risks, which sucks a little...
The next step to take after #10 . There is an idea of the dimensionality of features/labels. So that shall be enough to create a model prototype, based on the research. The model can be created by making use of pipelines and subclassing base classes, to package the model into an sklearn-like class.