cognoma / machine-learning

Machine learning for Project Cognoma
Other
32 stars 47 forks source link

K Nearest Neighbor #41

Closed yigalron closed 7 years ago

yigalron commented 8 years ago

K nearest neighbor - 2 fit methods: one seems to be overfitting, the other one seems better

dhimmel commented 8 years ago

Nice work. Can you export your notebook to a .py file for commenting? You can do this by running the following command from the algorithms directory:

jupyter nbconvert --to=script --FilesWriter.build_directory=scripts KNeighbor-yigalron.ipynb
dhimmel commented 8 years ago

@yigalron would love to see your work merged into master, so this a friendly nudge :smiling_imp:

yigalron commented 8 years ago

Thanks, will do before next meeting; I was away on vacation for the last 2 weeks...

Thanks! Yigal

yigalron commented 8 years ago

@dhimmel ok, ready for review

yigalron commented 7 years ago

I committed a new set of changes; did you get a notification? basically I removed some unused code and also tried different settings for number of neighbors; I also tried "uniform" vs. "distance" classification, and found that the uniform works better.

dhimmel commented 7 years ago

@yigalron, yep got the notification. I'm aiming to get around to reviewing this today.

dhimmel commented 7 years ago

@yigalron, great work getting this pull request up to speed.

It looks like somewhere your notebook became duplicated. KNeighbor-yigalron.ipynb and KNeighbor-yigalron.ipynb exist in the repository's root directory as well as in algorithms. We'll want make sure the final notebooks from this pull request are in algorithms and there are no new files in the root directory.

The results are looking really good -- strong performance and nice ROC curves. There isn't over-fitting in training. It would be great to print out the optimal parameters chosen by grid search.

In the current notebook at fdbf984b518f7ee118098631d02a597e73fec8b6, we don't know which value for n_neighbors was deemed optimal. You can retrieve this with clf_grid.best_params_. Can you add this output? Sorry if you have to run the whole thing again!

dhimmel commented 7 years ago

@yigalron, this is great. So 20 nearest neighbors gave the best cross-validated performance. Good to know. One final thing that would be helpful would be to see is the mean AUROC for each parameter combination. You should be able to retrieve this info from clf_grid after you fit the model.

That's all I can think of -- then I think we'll be ready to merge.

dhimmel commented 7 years ago

The cross-validation plot of performance for various number of neighbors in this notebook looks great!

Only remaining item is to remove the two files in the root directory.

yigalron commented 7 years ago

done - please check it out.

On Mon, Dec 19, 2016 at 7:57 PM, Daniel Himmelstein < notifications@github.com> wrote:

The cross-validation plot of performance for various number of neighbors in this notebook https://github.com/cognoma/machine-learning/blob/3c05ff8f7518e2213c615c3d97281b1a85633e06/algorithms/KNeighbor-yigalron.ipynb looks great!

Only remaining item is to remove the two files in the root directory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cognoma/machine-learning/pull/41#issuecomment-268123317, or mute the thread https://github.com/notifications/unsubscribe-auth/ATmEl3yi2d6J6_QNPh7uTvK9tM0P1lKvks5rJygFgaJpZM4JuyMh .

yigalron commented 7 years ago

Thanks! hopefully i'll be able to get to the next meeting and see how we move this forward.

On Wed, Dec 21, 2016 at 2:17 PM, Daniel Himmelstein < notifications@github.com> wrote:

@dhimmel approved this pull request.

Great work seeing this PR through and contributing the K Nearest Neighbors to Cognoma!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cognoma/machine-learning/pull/41#pullrequestreview-14030787, or mute the thread https://github.com/notifications/unsubscribe-auth/ATmEl2QLbaWFFo9W9g9qDWXwvilK5hDfks5rKXs3gaJpZM4JuyMh .