Updated main.py: I've separated reading CLI arguments from the actual preprocessing, those are now passed as function arguments. I've also created helper functions for test&train set size; that way we can cleanly change the sampling size without having to change a lot of hardcoded values
I've removed .pkl files from the repo, and the write logic from the features; those cannot be used anymore because of the sampling logic. We could replace them with a single pkl storing the data after generating features, but I haven't done that yet.
@TomBrunner @MichielvdBerg please look this through, I've had to change some logic here and there to match up data types.
Updated main.py: I've separated reading CLI arguments from the actual preprocessing, those are now passed as function arguments. I've also created helper functions for test&train set size; that way we can cleanly change the sampling size without having to change a lot of hardcoded values
I've removed .pkl files from the repo, and the write logic from the features; those cannot be used anymore because of the sampling logic. We could replace them with a single pkl storing the data after generating features, but I haven't done that yet.
@TomBrunner @MichielvdBerg please look this through, I've had to change some logic here and there to match up data types.