abronte / PysparkProxy

Seamlessly execute pyspark code on remote clusters
Other
4 stars 0 forks source link

pyspark.ml.* #26

Open abronte opened 5 years ago

abronte commented 5 years ago

Implement pyspark.ml.* apis.

Start with these:

from pyspark.ml.feature import HashingTF, IDF, Tokenizer
from pyspark.ml.feature import OneHotEncoder, StringIndexer, VectorAssembler, IndexToString
from pyspark.ml.feature import Tokenizer, RegexTokenizer, StopWordsRemover
from pyspark.ml.feature import CountVectorizer
from pyspark.ml.feature import MinHashLSH
from pyspark.ml.feature import IndexToString
from pyspark.ml.feature import BucketedRandomProjectionLSH

from pyspark.ml import Pipeline

from pyspark.ml.linalg import Vectors, VectorUDT, DenseVector, Vector

from pyspark.ml.fpm import FPGrowth

from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.classification import NaiveBayes
abronte commented 5 years ago

Looks like this is partially blocked due to some underlying data types needing to be re-worked a bit. #31