cmdevries / LMW-tree

Learning M-Way Tree - Web Scale Clustering - EM-tree, K-tree, k-means, TSVQ, repeated k-means, bitwise clustering
http://lmwtree.devries.ninja
BSD 3-Clause "New" or "Revised" License
75 stars 20 forks source link

Implement distributed version of streaming parallel EM-tree #5

Open cmdevries opened 10 years ago

cmdevries commented 10 years ago

Compress transmission of integer accumulators between machines vectors using https://github.com/lemire/FastPFOR.

Hadoop + HDFS (just get hadoop to hand over the bytes, or use HDFS directly).

ZeroMQ + GlusterFS.

Apache Spark might work well with python bindings for library, https://github.com/apache/spark.

HDFS + Erlang scheduler (gascheduler) + C++ code as a simple TCP server.