bdelespierre / php-kmeans

PHP K-Means
MIT License
91 stars 41 forks source link

Specify a distance function #32

Closed bdelespierre closed 2 years ago

bdelespierre commented 3 years ago

Proposed change

$algo = new Kmeans\Algorithm($init);
$algo->setDistanceFunction(
    fn (Point $a, Point $b) => geocode_dist($a->getCoordinates(), $b->getCoordinates)
);
$result = $algo->clusterize($points, $K);
battlecook commented 3 years ago

Are you accepting the distance as a callback function to give the user more flexibility in injecting the distance function?

If it is modified as suggested, will setDistanceFunction be added to AlgorithmInterface?

bdelespierre commented 3 years ago

Are you accepting the distance as a callback function to give the user more flexibility in injecting the distance function?

Absolutely. So the client code can change just that and keep the rest in place. Otherwise, subtyping Algorithm class would be necessary (which is "technically" better but not very user-friendly IMMHO).

If it is modified as suggested, will setDistanceFunction be added to AlgorithmInterface?

Yes, my bad I forgot it. Here it is:

interface AlgorithmInterface
{
    public function clusterize(PointCollectionInterface $points, int $nbClusters): ClusterCollectionInterface;

    public function setDistanceFunction(callable $function): void;
}

An immediate candidate for this would be hamming dist :+1:

bdelespierre commented 2 years ago

As it turns out, specifying a distance function is not enough: one also needs to provide a centroid determination function as well.

Such an implementation can be seen in GPS\Algorithm implementation, also showing the evolution of the AlgorithmInterface that reflects that necessity to implement both methods (getDistanceBetween and findCentroid.)

Furthermore, I can't think of a valid use-case of clustering points outside both GPS and Euclidean space :thinking: But nonetheless it will be now possible through subclassing of Kmeans\Algorithm.

Hence, this issue is now irrelevant.