MDAIceland / WaterSecurity

1 stars 1 forks source link

Model handler #42

Closed VasLem closed 3 years ago

VasLem commented 3 years ago

I know that there is some documentation left to write.. :sweat: So I have used a dummy dataset, the economy_preprocessed.csv, as the unlabeled dataset. From there, I managed to take the country 2 digit code and compare it with the country I was getting from the API I am using. This API can bring the names of cities and countries wherever in the world, I have signed up here , it is a very nice API, but we got a quota of 2000 serves per month, so I guess we might need to update the code before submission with a fresh username. So @ekaan and @bajo1207 do you think that we can have the column 2 digit code also in the final dataset? I think it would be helpful to avoid overheads.

Once this is pulled:

@OlympiaG and @adriana-madi add anything about the classifier model in classification/classifier.py . The expected input is features and a label from 0 to 3

@antosalerno add anything about the feature selection in classification/feature_selection.py . It is supposed to receive the features and a label from 0 to 3.

Classifiers/Feature Selection will be fitted to all the different risks, so we will get 7 pipelines, that will be pickled and ready to be loaded once the app starts. Each of the pipeline is going to predict the severity level of each risk. The selected features do not have to be the same per pipeline. Of course, this plan is something I came up to code onto it, there is high probability of changing it, if any difficulties are met throughout the process. For example, we could have the feature selection as an umbrella before the classifiers, or, instead of calculating 7 predictions, we could do only one of them.

The interesting part resides in model/model_handler.py where this logic is being assessed.

0 to 3, this is the risk severity , I took into conderation @antosalerno what you were saying, so I think it might be better to train a regression model.

Probably the three of you need to use a notebook at first to check for data integrity. Let's talk more about it tomorrow.

VasLem commented 3 years ago

Oh and of course you can check to run the app using "python run.py"

VasLem commented 3 years ago

Performed various fixes, here and there, added documentation.