Closed maxwest-uw closed 1 week ago
Before [e42321b3] | After [4d964c91] | Ratio | Benchmark (Parameter) |
---|---|---|---|
148±2ms | 150±2ms | 1.02 | benchmarks.time_learn_loop('KNN', 'RandomSampling') |
140±5ms | 141±7ms | 1.01 | benchmarks.time_feature_creation |
197M | 197M | 1 | benchmarks.peakmem_learn_loop('KNN') |
188M | 188M | 1 | benchmarks.peakmem_learn_loop('RandomForest') |
153±1ms | 153±2ms | 1 | benchmarks.time_learn_loop('KNN', 'UncSampling') |
2.56±0.01s | 2.56±0.02s | 1 | benchmarks.time_learn_loop('RandomForest', 'RandomSampling') |
2.58±0.02s | 2.58±0.01s | 1 | benchmarks.time_learn_loop('RandomForest', 'UncSampling') |
Click here to view all benchmarks.
Change Description
Unifies the feature output format to write using the same function.
Part 1 of #78
Solution Description
Previously, the code would write each line individually, creating the comma separated structure on the fly. Instead of writing per line, we generate all of the features and place them in a
pandas.DataFrame
and use the pandasto_csv
function. This unifies our output method, ensure that all the data is uniform and ensures that we fail gracefully in the even of issues with the features. This also prepares us for saving feature results to MongoDB in a future update.Code Quality
Project-Specific Pull Request Checklists
Other Change Checklist