Closed MengfeiJiang closed 8 years ago
Sure, I will take plot script and a part of the model script in clustering.
Maybe wrap something into functions, like when fitting model.
@MengfeiJiang @pttzty So we are not using the 2-hr bin features for clustering, is that right? @ye-zhi @lizhua @lichenzhi New data loading function for 2-hr bin features is merged, please continue classification using this new data loading function. Classification performs better using the new data structure.
@changsiyao Mengfei and I are not doing 2-hour bin features because a large number of features will make the high-dimension curse problem in clustering. since we are using the euclidian distance
I'm still a little confused what we should output in our classification models. My current idea is to at least output our model, parameter and CV performance measures. But should we also output the predictions we make? If so, our functions need to take in train_x, train_y and test_x set separately. (Currently, we are randomly splitting data into training and test inside our function, which doesn't make much sense to output test predictions.) @ye-zhi @lichenzhi @lizhua @MengfeiJiang @pttzty
Feel free to add anything. @pttzty @changsiyao @ye-zhi @lizhua @lichenzhi