berkeley-stat222 / mousestyles

2016 final project
http://berkeley-stat222.github.io/mousestyles/
BSD 2-Clause "Simplified" License
2 stars 33 forks source link

Task list: Project 5 - Classification and Clustering #125

Closed MengfeiJiang closed 8 years ago

MengfeiJiang commented 8 years ago
  1. Clustering:
  2. [ ] Plot script
  3. [ ] Model script
  4. [ ] Test
  5. [ ] Documentation
  6. Classification:
  7. [ ] Plot script
  8. [ ] Model script
  9. [ ] Test
  10. [ ] Documentation

Feel free to add anything. @pttzty @changsiyao @ye-zhi @lizhua @lichenzhi

pttzty commented 8 years ago

Sure, I will take plot script and a part of the model script in clustering.

changsiyao commented 8 years ago

Maybe wrap something into functions, like when fitting model.

changsiyao commented 8 years ago

@MengfeiJiang @pttzty So we are not using the 2-hr bin features for clustering, is that right? @ye-zhi @lizhua @lichenzhi New data loading function for 2-hr bin features is merged, please continue classification using this new data loading function. Classification performs better using the new data structure.

pttzty commented 8 years ago

@changsiyao Mengfei and I are not doing 2-hour bin features because a large number of features will make the high-dimension curse problem in clustering. since we are using the euclidian distance

changsiyao commented 8 years ago

I'm still a little confused what we should output in our classification models. My current idea is to at least output our model, parameter and CV performance measures. But should we also output the predictions we make? If so, our functions need to take in train_x, train_y and test_x set separately. (Currently, we are randomly splitting data into training and test inside our function, which doesn't make much sense to output test predictions.) @ye-zhi @lichenzhi @lizhua @MengfeiJiang @pttzty