bdelespierre / php-kmeans

PHP K-Means
MIT License
91 stars 41 forks source link

Resume algorithm execution #28

Open bdelespierre opened 3 years ago

bdelespierre commented 3 years ago

I believe it would be nice to be able to resume algorithm execution after its completion. It could be useful as new points are being added so previous iterations don't need to be re-run again.

Example: I have clustered my 100 000 users into 5 clusters. Since the last clustering, 100 new users have been added. Most of them are probably already very close to the existing clusters' centroids. Hence, I should be able to resume clustering the same dataset PLUS the new users to save time.

battlecook commented 3 years ago

It would be good to provide this function as an option. Because the added points will also affect the creation of the cluster. Clustering 100000 users and clustering 100100 users may have different results. So it would be nice to have 2 options when using the library.

  1. After 100000 users are clustered, 100 additional users are clustered
  2. Re-clustering 100100 users
bdelespierre commented 3 years ago

Yes. I would propose something like:

$algo = new Kmeans\Algorithm:(new Kmeans\RandomInitialization());

$result = $algo->clusterize($points, $nbClusters);

$serialized = serialize($result);

// later...

$previousRun = unserialize($serialized);

$result = $previousRun->resume($newPoints);
battlecook commented 3 years ago

looks good 👍

bdelespierre commented 3 years ago

I've been thinking about a result object for Algorithm::clusterize. What do you think of this API?

<?php

namespace Bdelespierre\Kmeans\Interfaces;

interface ClusterizationResultInterface extends \Serializable
{
    public function hasReachedConvergence(): bool;

    /**
     * @return int<0, max>
     */
    public function iterationsCount(): int;

    public function getClusters(): ClusterCollectionInterface;

    public function resume(PointCollectionInterface $newPoints): self;
}
battlecook commented 3 years ago

Sorry for checking late. (I confirmed that it was committed to pr.)

I think it's fine. But I think we'll have to do some more work to be more confident about the interface design.

bdelespierre commented 3 years ago

It's not implemented in #27. I plan to implement that later