JasonDCox / ML-Mentorship-GovSchool

0 stars 0 forks source link

Implement kNN Supervised Learning Algorithm in Python (Static Data) #1

Closed gavinjalberghini closed 2 years ago

gavinjalberghini commented 2 years ago

Description: The kNN algorithm is a commonly used tool for performing supervised machine learning tasks. For this ticket, you will write an implementation of the kNN algorithm for the multi-class classification problem. The output of your model should be a confusion matrix as well as compiled metrics. Do not use any ML libraries. Write this implementation yourself. The use of online references is allowed and encouraged.

  1. The KNN Algorithm The kNN algorithm should read an arff data file from the command line. In addition to implementing the core algorithm, implement three different distance calculations (Use the kdnuggets ref for assistance picking algorithms). Toggle which distance algorithm is used via a command-line flag. The algorithm should output the elapsed time, selected distance measure, and full confusion matrix as a file.

kNN Sudo - https://towardsdatascience.com/k-nearest-neighbours-introduction-to-machine-learning-algorithms-18e7ce3d802a Distance Measures - https://www.kdnuggets.com/2020/11/most-popular-distance-metrics-knn.html MC Confusion Matrix - https://www.analyticsvidhya.com/blog/2021/06/confusion-matrix-for-multi-class-classification/ Python command line args tutorial - https://www.tutorialspoint.com/python/python_command_line_arguments.htm Python 3 docs - https://docs.python.org/3/tutorial/ Python for beginners - https://www.youtube.com/watch?v=kqtD5dpn9C8

  1. The Data ARFF data is a particular data format that is common for machine learning. The header of each file specifies the layout of the data. In our case, the last value of a data point is the class value. Be sure you are not including this value when you calculate distance.

Understanding ARFF Data - https://www.cs.waikato.ac.nz/ml/weka/arff.html

Acceptance Criteria: The file kNN.py is created, can ingest and learn on static data files (use small.arff from #files in Discord), then output the desired information.

gavinjalberghini commented 2 years ago

V: 3

gavinjalberghini commented 2 years ago

Please post progress here for the 10/4 meeting.

brandonC1234 commented 2 years ago

Finally finished the entire thing. Everything seems fine, but I have no way of verifying metrics are correct. Now I just have to get it onto my laptop and implement the command line inputs.

gavinjalberghini commented 2 years ago

Brandon: Completed implementation. Finished command line args. Verified outputs. Has correct output.

Jason: Partial implementation. No command line or output yet. Currently prints out the predictions and actual class values for data.

brandonC1234 commented 2 years ago

complete - Brandon

gavinjalberghini commented 2 years ago

Jason still needs to output metrics and matrix. Other than this both implementations are complete. Minor refinements to follow. Excellent work.