Dev pipeline order change

This branch changes the preparation pipeline from clean -> select records -> engineer features -> select features to clean -> engineer features -> select records -> scale features -> select features. Some major changes and key takeaways:

output of feature engineering is a single .csv file which can be shared to other parties
output of select records, scale features, and select features can be used to train models (they are .h5 files)
the dataset is split into objectives AND split into train/test during select records
feature scaling can be completely omitted and even skipped
feature scaling now also supports normalization by class count or line count
year (and source and class/line count) is kept in the dataframes as metadata - it is prefixed by metadata_

adamjanovsky / AndroidMalwareCrypto

Dev pipeline order change #31