ddf-project / DDF

Distributed DataFrame: Productivity = Power x Simplicity For Scientists & Engineers, on any Data Engine
http://ddf.io
Apache License 2.0
167 stars 42 forks source link

add more algorithms #53

Open ljzzju opened 9 years ago

ljzzju commented 9 years ago

Sorry firstly, I have no idea where to put this issue.

By using the java reflection , ddf can easily implement the mllib algrithms.

I have found that the method name is restricted to be MLClassMethods.DEFAULT_TRAIN_METHOD_NAME which is defined as "train" in io.ddf.ml.MLClassMethods

this is fine for many mllib algrithms because they have provided the "train" method.

However, there are excludings , e.g. RandomForest . thus RF cannot be simply defined as KMeans does, like:

public IModel decisionTree(args....) throws DDFException { return this.train("decisionTree", args...); }

So I wonder things should be changed to let the actural training method awared towards specific mllib algorithm.

Here is what my suggestion:

(1) expand the current algorithm traing entrance with a training mehtod parameter, e.g

current: public IModel train(String trainMethodName, Object... paramArgs) throws DDFException modified: public IModel train(String trainMethodName, String runMethodName, Object... paramArgs) throws DDFException

(2) the API provided to users should not include the runMethodName, thus maintaining the current ddf algorithm API entrance,, e.g

modified: public IModel KMeans(int numCentroids, int maxIters, int runs) throws DDFException { return this.train("kmeans", "train",numCentroids, maxIters, runs); }

any help would be appreciated, thanks.

// //// ISupportML //////

/**

khangich commented 9 years ago

@ljzzju I think this is a good idea. Can you create a PR for this ?

binhmop commented 9 years ago

@ljzzju Please post this to https://groups.google.com/forum/#!forum/ddf-project for discussion. Thanks.

ctn commented 9 years ago

We've recently implemented broad support for MLLib. @Huandao0812 is this going into a PR yet?