ML Model Implementation

VasLem commented 3 years ago

The next step to take after #10 . There is an idea of the dimensionality of features/labels. So that shall be enough to create a model prototype, based on the research. The model can be created by making use of pipelines and subclassing base classes, to package the model into an sklearn-like class.

OlympiaG commented 3 years ago

I tried to create a Multinomial Logistic Regression classifier and predicting the classes' probability (different risks). I am not sure at all if it is the correct way and if the code is efficient. So, @bajo1207 could you also check it and tell me your opinion? So if it's wrong or doesn't work, I will rewrite it or change it (I can send you the .py).

ekaan commented 3 years ago

@antosalerno do you need any assistance or you are just changing the parameters and see which one will work the best? I might try to help

antosalerno commented 3 years ago

Exactly, that's what I'm doing. I'm performing cross validation and storing in a file the best parameters.

VasLem commented 3 years ago

Could you please communicate the structure of the expected file, so that I can start working on the python file equivalent implementation? @antosalerno

antosalerno commented 3 years ago

Is it ok a dictionary saved as .pkl file?

VasLem commented 3 years ago

The following is the output from the bayesian optimization. I have added more commits. @antosalerno you can see the notebook XGBoost Fitting Using Bayesian Optimization.ipynb as a reference. I have run end to end the algorithm. I also added the elevation feature for each city, just noticed I could retrieve it from the geocities :D. You are welcome to pull and experiment.

Risk: Higher water prices

Samples Size: 87

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-1.029	10.69	0.6609	0.8468	6.822	474.6
2	-1.057	8.866	0.4458	0.9086	1.264	577.8
3	-1.02	12.46	0.3204	0.4169	3.166	934.4
4	-1.062	11.14	0.7036	0.2152	2.133	483.0
5	-1.099	3.302	0.7593	0.5758	4.68	407.2
6	-1.261	4.197	0.3899	0.6167	3.708	355.4
7	-1.543	0.7062	0.4681	0.9942	1.768	420.4
8	-1.065	16.42	0.544	0.1535	5.441	481.7
9	-1.306	4.866	0.5051	0.2079	5.595	869.3
10	-1.317	0.3069	0.7189	0.6719	4.606	952.0
11	-1.085	18.14	0.3	0.2972	2.536	926.1
12	-1.204	4.226	0.3	0.8667	1.504	926.5
13	-1.079	19.73	0.3395	0.4359	6.486	938.3
14	-1.073	18.56	0.9	0.0	1.0	468.6
15	-1.329	4.181	0.4546	0.6282	6.974	394.8
16	-1.091	6.004	0.3008	0.8962	2.36	466.8
17	-1.432	0.2857	0.9	0.0	7.0	478.9
18	-1.098	15.43	0.5263	0.7172	1.138	475.5
19	-1.084	15.68	0.3786	0.7065	6.862	462.9
20	-1.041	13.68	0.3	0.0	1.0	942.2

Risk: Inadequate or aging infrastructure Samples Size: 148

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-0.9487	3.223	0.4324	0.2213	6.649	901.8
2	-0.7952	7.654	0.6042	0.8987	2.619	887.6
3	-0.7991	13.49	0.7745	0.3991	1.687	890.5
4	-0.8813	6.473	0.4275	0.03099	4.635	856.4
5	-0.8075	16.57	0.7656	0.5099	2.545	412.5
6	-0.8715	5.347	0.6474	0.2598	4.645	650.2
7	-0.8213	17.52	0.4126	0.8959	1.846	286.9
8	-0.9036	2.187	0.3741	0.373	5.926	932.2
9	-0.9287	3.209	0.8861	0.2588	5.442	311.7
10	-0.8093	19.03	0.4964	0.03291	5.319	661.6
11	-0.7864	13.08	0.7076	1.0	1.0	882.6
12	-0.942	2.054	0.6671	0.5189	2.895	877.2
13	-0.8149	13.78	0.3	1.0	7.0	885.6
14	-0.806	18.86	0.7604	0.9814	2.055	884.3
15	-0.8846	5.936	0.5739	0.3046	4.683	410.1
16	-0.8096	18.49	0.615	0.27	3.388	421.8
17	-0.839	19.85	0.7831	0.1993	6.613	671.8
18	-0.8302	20.0	0.3	1.0	1.0	402.2
19	-0.8294	18.93	0.8319	0.1886	4.519	433.6
20	-0.8084	20.0	0.3129	0.1215	4.873	274.6

Risk: Increased water stress or scarcity Samples Size: 261

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-0.3626	2.766	0.4401	0.3459	5.671	613.4
2	-0.3388	7.764	0.3748	0.2113	4.651	944.9
3	-0.3506	2.365	0.3556	0.6195	6.547	955.9
4	-0.3364	6.016	0.8841	0.9525	2.226	460.3
5	-0.3388	17.84	0.3491	0.07574	5.649	953.1
6	-0.3588	18.97	0.877	0.3896	5.064	772.2
7	-0.3311	11.32	0.8818	0.6683	5.476	802.1
8	-0.3343	12.86	0.3972	0.07618	2.339	850.1
9	-0.3335	5.983	0.8199	0.8885	4.27	630.0
10	-0.3256	5.92	0.3834	0.8878	3.168	526.8
11	-0.3315	2.648	0.4765	0.7296	4.392	895.3
12	-0.3338	18.77	0.481	0.5995	1.645	803.7
13	-0.3317	13.77	0.3929	0.2846	3.817	525.9
14	-0.3744	1.682	0.6329	0.02555	6.661	517.7
15	-0.3286	9.322	0.4649	0.3631	3.073	530.0
16	-0.3459	2.841	0.5927	0.5091	1.658	532.3
17	-0.3276	9.349	0.4026	0.2734	6.366	528.0
18	-0.3298	12.92	0.8218	0.9314	4.781	809.7
19	-0.3396	5.678	0.6093	0.3203	1.765	807.0
20	-0.3583	19.56	0.8028	0.4251	6.82	813.8

Risk: Declining water quality Samples Size: 183

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-1.017	15.24	0.5021	0.4765	4.03	955.4
2	-1.146	3.951	0.3069	0.2203	3.108	673.4
3	-1.026	4.0	0.7201	0.9597	4.941	961.3
4	-1.141	9.27	0.7984	0.171	5.715	468.6
5	-0.9946	7.37	0.6244	0.01921	2.888	307.8
6	-1.134	6.55	0.5938	0.01857	6.075	554.2
7	-1.025	17.2	0.7216	0.841	5.813	325.2
8	-1.075	19.34	0.7808	0.1896	6.464	395.2
9	-0.9351	7.142	0.54	0.1324	1.457	764.5
10	-1.018	16.05	0.6447	0.4008	4.759	461.8
11	-0.9565	7.743	0.5571	0.3126	2.624	307.9
12	-0.8607	6.901	0.5264	0.9323	1.247	763.3
13	-0.8679	7.995	0.8548	0.9833	1.993	763.2
14	-0.9363	7.28	0.8611	0.5022	2.088	761.0
15	-1.057	5.228	0.6352	0.4284	3.399	763.0
16	-1.059	9.406	0.4173	0.1764	3.534	762.6
17	-0.9167	8.846	0.542	0.05328	1.301	309.6
18	-0.8616	9.363	0.6971	0.9884	1.129	763.9
19	-0.9114	11.23	0.8064	0.8312	2.234	308.9
20	-0.8564	7.754	0.4608	0.7676	1.014	762.1

Risk: Increased water demand Samples Size: 98

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-1.217	18.02	0.6812	0.5451	6.896	643.6
2	-1.229	19.69	0.4645	0.7135	5.75	721.9
3	-1.222	16.56	0.8659	0.4097	5.167	530.5
4	-1.209	19.16	0.6947	0.9955	5.455	651.1
5	-1.154	12.16	0.3491	0.1814	4.748	410.0
6	-1.193	19.48	0.8924	0.8026	1.26	248.8
7	-1.186	8.315	0.3632	0.9371	1.101	847.2
8	-1.222	11.47	0.671	0.6548	4.981	650.4
9	-1.175	15.08	0.5308	0.3679	6.299	555.1
10	-1.236	17.97	0.6862	0.1582	3.3	737.8
11	-1.183	11.06	0.5585	0.6782	5.373	409.1
12	-1.183	12.51	0.4233	0.1829	3.819	410.7
13	-1.183	11.49	0.5731	0.2041	5.964	410.1
14	-1.192	12.78	0.7163	0.9359	5.117	410.6
15	-1.172	10.66	0.3185	0.1697	3.579	409.3
16	-1.226	13.66	0.8328	0.2321	5.406	408.6
17	-1.203	11.45	0.5863	0.5618	4.004	410.6
18	-1.222	9.69	0.5864	0.08564	4.079	408.6
19	-1.178	8.321	0.3652	0.7539	1.638	847.6
20	-1.152	12.15	0.857	0.2001	3.915	410.0

Risk: Regulatory Samples Size: 65

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-0.6468	15.19	0.8833	0.1425	1.581	478.6
2	-0.6468	14.5	0.6182	0.6762	4.132	506.6
3	-0.7314	7.599	0.5093	0.5005	5.549	449.4
4	-0.6495	12.97	0.7692	0.7693	6.901	950.7
5	-0.6468	15.29	0.3974	0.7186	4.209	551.4
6	-0.7587	5.216	0.8815	0.8894	4.746	758.1
7	-0.6468	18.41	0.3315	0.4609	5.848	485.1
8	-0.6468	13.9	0.5407	0.595	3.143	348.1
9	-0.6468	16.1	0.4538	0.407	1.782	290.3
10	-0.8251	4.691	0.6606	0.373	5.543	788.6
11	-0.9881	0.0	0.9	0.9517	1.0	618.7
12	-0.8716	0.3495	0.4567	0.1852	4.343	213.2
13	-0.9698	2.446	0.8587	0.3136	3.82	999.7
14	-0.6468	12.5	0.664	0.9763	2.782	348.4
15	-0.6468	20.0	0.4687	1.0	7.0	904.0
16	-0.938	0.0	0.9	0.0	7.0	317.0
17	-0.6468	17.74	0.9	1.0	1.0	375.6
18	-1.197	0.0	0.9	0.0	1.0	924.7
19	-0.9668	0.0	0.3	1.0	1.0	492.3
20	-0.6739	13.74	0.3	0.0	7.0	362.6

Risk: Energy supply issues Samples Size: 59

iter	target	alpha	colsam...	gamma	max_depth	n_estimators
1	-0.6747	1.412	0.7192	0.7479	2.893	656.5
2	-0.6753	6.899	0.3909	0.7769	3.047	501.6
3	-0.7638	0.4693	0.4836	0.5069	5.302	690.3
4	-0.7093	1.515	0.7129	0.205	4.67	657.3
5	-0.6518	18.39	0.8179	0.3172	6.033	759.0
6	-0.6515	13.71	0.7813	0.5914	5.963	740.3
7	-0.6518	16.69	0.7496	0.4808	2.56	485.7
8	-0.6518	19.3	0.6752	0.9545	1.855	787.9
9	-0.6516	13.54	0.7421	0.4082	3.906	304.4
10	-0.6509	13.06	0.6228	0.8618	4.04	909.1
11	-0.6518	15.0	0.8898	0.3295	2.928	908.8
12	-0.6518	15.03	0.4379	0.7111	4.273	913.7
13	-0.6288	9.693	0.4822	0.1611	1.029	913.5
14	-0.6448	8.011	0.699	0.7634	4.769	915.0
15	-0.676	5.709	0.3277	0.8865	1.858	911.2
16	-0.6493	11.84	0.6053	0.4917	1.769	917.5
17	-0.6545	6.212	0.3351	0.541	1.303	918.5
18	-0.6515	13.68	0.5229	0.237	6.617	746.9
19	-0.6518	18.24	0.3344	0.6481	5.667	752.2
20	-0.6518	18.07	0.7913	0.7944	2.729	744.6

antosalerno commented 3 years ago

Great @VasLem!

VasLem commented 3 years ago

This is not the final version, I just plugged things in, if anyone of you had managed to get better results, please do your commits!!! Also, I cheated a litte there, I didn't use a test set, only the cv :fearful:

VasLem commented 3 years ago

@OlympiaG do we have any news about the Bagging option? Currently I report mse close to 1 for most risks, which sucks a little...

MDAIceland / WaterSecurity

ML Model Implementation #31