Closed yigalron closed 7 years ago
Nice work. Can you export your notebook to a .py file for commenting? You can do this by running the following command from the algorithms
directory:
jupyter nbconvert --to=script --FilesWriter.build_directory=scripts KNeighbor-yigalron.ipynb
@yigalron would love to see your work merged into master, so this a friendly nudge :smiling_imp:
Thanks, will do before next meeting; I was away on vacation for the last 2 weeks...
Thanks! Yigal
@dhimmel ok, ready for review
I committed a new set of changes; did you get a notification? basically I removed some unused code and also tried different settings for number of neighbors; I also tried "uniform" vs. "distance" classification, and found that the uniform works better.
@yigalron, yep got the notification. I'm aiming to get around to reviewing this today.
@yigalron, great work getting this pull request up to speed.
It looks like somewhere your notebook became duplicated. KNeighbor-yigalron.ipynb
and KNeighbor-yigalron.ipynb
exist in the repository's root directory as well as in algorithms
. We'll want make sure the final notebooks from this pull request are in algorithms
and there are no new files in the root directory.
The results are looking really good -- strong performance and nice ROC curves. There isn't over-fitting in training. It would be great to print out the optimal parameters chosen by grid search.
In the current notebook at fdbf984b518f7ee118098631d02a597e73fec8b6, we don't know which value for n_neighbors
was deemed optimal. You can retrieve this with clf_grid.best_params_
. Can you add this output? Sorry if you have to run the whole thing again!
@yigalron, this is great. So 20 nearest neighbors gave the best cross-validated performance. Good to know. One final thing that would be helpful would be to see is the mean AUROC for each parameter combination. You should be able to retrieve this info from clf_grid
after you fit the model.
That's all I can think of -- then I think we'll be ready to merge.
The cross-validation plot of performance for various number of neighbors in this notebook looks great!
Only remaining item is to remove the two files in the root directory.
done - please check it out.
On Mon, Dec 19, 2016 at 7:57 PM, Daniel Himmelstein < notifications@github.com> wrote:
The cross-validation plot of performance for various number of neighbors in this notebook https://github.com/cognoma/machine-learning/blob/3c05ff8f7518e2213c615c3d97281b1a85633e06/algorithms/KNeighbor-yigalron.ipynb looks great!
Only remaining item is to remove the two files in the root directory.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cognoma/machine-learning/pull/41#issuecomment-268123317, or mute the thread https://github.com/notifications/unsubscribe-auth/ATmEl3yi2d6J6_QNPh7uTvK9tM0P1lKvks5rJygFgaJpZM4JuyMh .
Thanks! hopefully i'll be able to get to the next meeting and see how we move this forward.
On Wed, Dec 21, 2016 at 2:17 PM, Daniel Himmelstein < notifications@github.com> wrote:
@dhimmel approved this pull request.
Great work seeing this PR through and contributing the K Nearest Neighbors to Cognoma!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cognoma/machine-learning/pull/41#pullrequestreview-14030787, or mute the thread https://github.com/notifications/unsubscribe-auth/ATmEl2QLbaWFFo9W9g9qDWXwvilK5hDfks5rKXs3gaJpZM4JuyMh .
K nearest neighbor - 2 fit methods: one seems to be overfitting, the other one seems better