Closed halimkun closed 3 years ago
Currently, the only way to set the centroids is to learn them, however, we could perhaps implement a Seeder that will initialize the centroids to some known values. Here's the ++ seeder for a reference to the API.
https://docs.rubixml.com/latest/clusterers/seeders/plus-plus.html
Configurable as parameter number seven in K-means (applicable to other clusterers as well)
https://docs.rubixml.com/latest/clusterers/k-means.html
The new Seeder would take a list of known centroids and output k of them when asked to generate seeds. Would this solve your problem @halimkun?
For reference https://stackoverflow.com/questions/38355153/initial-centroids-for-scikit-learn-kmeans-clustering
Things to consider:
train()
first ... in cases where the centroids need to stay static, settings epochs
on K-means to 0 could work.Ok, we implemented a Preset seeder in the 1.1 branch (see https://github.com/RubixML/ML/commit/5063f5fa7e5e32036a6932aca59008ed70876d48), you can test it with `composer require rubix/ml:"1.1.x-dev" or wait for the release within a couple of weeks.
https://github.com/RubixML/ML/blob/1.1/docs/clusterers/seeders/preset.md
If you decide to test it, please provide us with your feedback. Thank you :)
wow it's available now, really cool mate because previously I added a few lines to your KMeans.php file. and looks like this.
. . .
public function setCentroids($centr){
$this->centroids = $centr;
}
. . .
and add a condition like this to the train()
function
if (!empty($this->centroids)){
$this->centroids = $this->seeder->seed($dataset, $this->k);
} else {
$this->centroids = $this->centroids;
}
for now it works here whether it effect into another line of code or not. and to use it just call it after the class declaration
$estimator = new KMeans(3,128,1000,10.0, 10, new Euclidean(), new PlusPlus());
$estimator->setCentroids([
[4,2,3],
[2,3,2],
[2,1,3]
]);
but since it's already officially available from the source, I'll switch now. thanks mate
Nice @halimkun, your solution looks good. From a library's perspective, we didn't want to encourage directly overwriting the centroids after training. Here is an example of how a solution would look using the new Seeder. Note that epochs
is set to 0 so that only the seeds are used and are not updated. If you wish to use the presets as a "starting point", you can of course train as normal after seeding.
use Rubix\ML\Clusterers\KMeans;
use Rubix\ML\Kernels\Distance\Euclidean;
use Rubix\ML\Clusterers\Seeders\Preset;
$centroids = [
[4,2,3],
[2,3,2],
[2,1,3],
];
$estimator = new KMeans(3, 128, 0, 10.0, 10, new Euclidean(), new Preset($centroids));
$estimator->train($dataset); // If necessary, use a dummy sample here with the correct dimensionality
how to manually set centroid for k means ?
I've looked for it in the existing documentation but it's not there. meanwhile sometimes users need to set the centroid manually but based on available data.
because who knows, every data tested by the user has fixed criteria, maybe because of company criteria or others.