Closed VasLem closed 3 years ago
For the feature selection, how do we want to have the unlabeled data? One big dataframe?
Yes, this dataset needs to be merged to the labeled dataset, before the feature selection. We will also need to define a train/test subset that will remain stable throughout the process, so that we can use the test to find out what happens with the trained model.
@antosalerno the dataset is there, you can start working on it
@ekaan also coordinate, so that to do the graphs you were talking about in #35
Necessary documentation is added as comments for the feature selection and feature generation script (feature_selection.py under classification)
Once #24 is over, the next important step is to compare the existing features prediction power. Several ways exist to achieve that. A step of features generation (eg PCA) may also be required. In order to make sure we have well-calibrated features, we could perform univariate examination of each feature separately, before plugging in the model decided and implemented based on #10.