High performance cross product

jiazou-bigdata commented 2 years ago

Implementation 1 (The old implementation): UDF-based selection; All trees are encoded in one UDF.

Commands:

ubuntu@ip-172-31-16-75:~/netsdb$ scripts/cleanupNode.sh

ubuntu@ip-172-31-16-75:~/netsdb$ ./scripts/startPseudoCluster.py 8 4000

bin/testDecisionForest Y 2200000 28 275000 F A 32 model-inference/decisionTree/experiments/HIGGS.csv_test.csv

bin/testDecisionForest N 2200000 28 275000 F A 32 model-inference/decisionTree/experiments/HIGGS.csv_test.csv model-inference/decisionTree/experiments/models/higgs_xgboost_1600_8_netsdb XGBoost

Execution Results: output count:2200000 positive count:1174745

Execution Time: UDF Execution Time Duration: **94.6148** secs.

Implementation 2 (The new implementation): Cross-product-based implementation. Each thread is responsible for a subset of trees. Each sample is sent to each thread for prediction. Then, prediction results will be aggregated.

Commands:

ubuntu@ip-172-31-16-75:~/netsdb$ scripts/cleanupNode.sh

ubuntu@ip-172-31-16-75:~/netsdb$ ./scripts/startPseudoCluster.py 8 4000

ubuntu@ip-172-31-16-75:~/netsdb$ bin/testDecisionForestWithCrossProduct Y 2200000 28 275000 32 model-inference/decisionTree/experiments/HIGGS.csv_test.csv model-inference/decisionTree/experiments/models/higgs_xgboost_1600_8_netsdb XGBoost

ubuntu@ip-172-31-16-75:~/netsdb$ bin/testDecisionForestWithCrossProduct N 2200000 28 275000 32 model-inference/decisionTree/experiments/HIGGS.csv_test.csv model-inference/decisionTree/experiments/models/higgs_xgboost_1600_8_netsdb XGBoost

Execution Results: total count:2200000 positive count:1174745

Execution Time: Model Inference Time Duration: **36.6242** secs.

jiazou-bigdata commented 2 years ago

@venkate5hgunda and @hguan6

I just verified that my current implementation on netsDB achieves 2.7 times speed-up with exactly the same inference results as the old implementation for the Higgs-1600 tree model on the 2.2 million Higgs dataset.

Could you team up to profile the reasons for the performance improvement?

The instructions to run the two implementations over Higgs are given in the pull request.

You are recommended to use profiling tools such as Linux perf to profile the L1-cache miss, L2-cache miss, branch mis-predictions, etc, on a native linux platform.

Let me know if any questions. Thanks!

jiazou-bigdata commented 2 years ago

@venkate5hgunda Is it OK to merge the code to master? Let me know if any issues.

venkate5hgunda commented 2 years ago

@venkate5hgunda Is it OK to merge the code to master? Let me know if any issues.

Yes, Professor. Upon Hardware Profiling of both methods, we're seeing a significant performance improvement and my results seems to align with your observations.

asu-cactus / netsdb

High performance cross product #62