Closed carterbox closed 2 years ago
Hello @carterbox! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
This feature introduces a clustering approach to mini-batch selection. The code is clean and tests are passing. Thank you for your contribution!
Purpose
Related to #145. Implements an algorithm for compact batch selection.
Approach
Uses a modified k-means clustering algorithm which limits the cluster sizes to be approximately equal. Starts with kmeans++ to initialize the clusters, then cycles through the points trying to swap them such that the total distance from point to cluster centroid is minimized. This swapping heuristic technically does not minimize the kmeans objective, but it does a good job of creating clusters without enclaves.
Pre-Merge Checklists
Submitter
yapf
to format python code.Reviewer