daphne-eu / daphne

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Apache License 2.0
67 stars 62 forks source link

running kmeans with --vec takes longer than without #898

Open philipportner opened 2 weeks ago

philipportner commented 2 weeks ago

When running the following kmeans script, adding the --vec flag increases the runtime of the program by a bit more than 50%.

kmeans.daphne Same script can be found in `test/api/cli/algorithms/kmeans.daphne` ``` // K-means clustering. // Arguments: // - r ... number of records // - c ... number of centroids // - f ... number of features // - i ... number of iterations // Data generation. X = rand($r, $f, 0.0, 1.0, 1, -1); C = rand($c, $f, 0.0, 1.0, 1, -1); // K-means clustering (decisive part). for(i in 1:$i) { D = (X @ t(C)) * -2 + t(sum(C ^ 2, 0)); minD = aggMin(D, 0); P = D <= minD; P = P / sum(P, 0); P_denom = sum(P, 1); C = (t(P) @ X) / t(P_denom); } // Result output. print(C); ```
cli command to run kmeans.daphne `time bin/daphne test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 i=10` `time bin/daphne --vec test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 i=10`
time output `bin/daphne test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 i=10 115.16s user 136.03s system 1109% cpu 22.634 total` `bin/daphne --vec test/api/cli/algorithms/kmeans.daphne r=1000000 f=100 c=50 461.03s user 821.06s system 3718% cpu 34.477 total`