databricks / spark-sql-perf

Apache License 2.0
582 stars 406 forks source link

[ML-4069] Improve timing of estimators #161

Closed jkbradley closed 6 years ago

jkbradley commented 6 years ago

This gives the following running times:

recommendation.ALS  72.083s
classification.DecisionTreeClassification   37.125s
classification.DecisionTreeClassification   33.274s
regression.DecisionTreeRegression   31.252s
regression.DecisionTreeRegression   63.35s
fpm.FPGrowth    6.219s
fpm.FPGrowth    5.342s
classification.GBTClassification    46.154s
regression.GBTRegression    45.832s
clustering.GaussianMixture  18.936s
regression.GLMRegression    20.342s
clustering.KMeans   32.473s
clustering.LDA  44.574s
clustering.LDA  24.658s
classification.LinearSVC    39.84s
regression.LinearRegression 43.335s
classification.LogisticRegression   41.637s
classification.LogisticRegression   37.711s
classification.NaiveBayes   23.351s
classification.RandomForestClassification   20.781s
regression.RandomForestRegression   39.971s
feature.Word2Vec    51.892s
jkbradley commented 6 years ago

I don't think it's worthwhile to try to get things closer to 30 sec since timings do not always scale linearly and since some algorithms (like Word2Vec) need more Params supported to let us fine-tune timings.

jkbradley commented 6 years ago

I updated the list of timings based on the latest updates for LDA and LogReg.

yogeshg commented 6 years ago

LGTM! Thanks

jkbradley commented 6 years ago

Thanks! I'll merge this.