MDAIceland / WaterSecurity

1 stars 1 forks source link

Feature Selection/Generation #29

Closed VasLem closed 3 years ago

VasLem commented 3 years ago

Once #24 is over, the next important step is to compare the existing features prediction power. Several ways exist to achieve that. A step of features generation (eg PCA) may also be required. In order to make sure we have well-calibrated features, we could perform univariate examination of each feature separately, before plugging in the model decided and implemented based on #10.

bajo1207 commented 3 years ago

For the feature selection, how do we want to have the unlabeled data? One big dataframe?

VasLem commented 3 years ago

Yes, this dataset needs to be merged to the labeled dataset, before the feature selection. We will also need to define a train/test subset that will remain stable throughout the process, so that we can use the test to find out what happens with the trained model.

VasLem commented 3 years ago

@antosalerno the dataset is there, you can start working on it

VasLem commented 3 years ago

@ekaan also coordinate, so that to do the graphs you were talking about in #35

ekaan commented 3 years ago

Necessary documentation is added as comments for the feature selection and feature generation script (feature_selection.py under classification)