databricks / spark-perf

Performance tests for Apache Spark
Apache License 2.0
379 stars 203 forks source link

Add perf tests for Gradient-Boosted Decision Trees #80

Closed feynmanliang closed 9 years ago

feynmanliang commented 9 years ago

@jkbradley please review after #79 merges

Adds performance tests for GBDT under shared decision-tree test infrastructure.

feynmanliang commented 9 years ago

@jkbradley Added directions for running against snapshot in comments. Do you mind reviewing code structure; in particular, I don't think the Either I'm using to support both RandomForestModel and GBTModel will be extensible.

jkbradley commented 9 years ago

The config file is the only issue I see. Also, please confirm that it compiles and runs locally. Thanks!

feynmanliang commented 9 years ago

Changes made; note my comment about redundancy in config.py.template.

I'm also not clear why the GBDT tests are running so much slower than RF tests, even for num-trees=1...

jkbradley commented 9 years ago

LGTM if it compiles. GBDTs don't subsample features, so for a larger number of features (> 50?), forests should be much faster.