Closed jiazou-bigdata closed 2 years ago
@venkate5hgunda and @hguan6
I just verified that my current implementation on netsDB achieves 2.7 times speed-up with exactly the same inference results as the old implementation for the Higgs-1600 tree model on the 2.2 million Higgs dataset.
Could you team up to profile the reasons for the performance improvement?
The instructions to run the two implementations over Higgs are given in the pull request.
You are recommended to use profiling tools such as Linux perf to profile the L1-cache miss, L2-cache miss, branch mis-predictions, etc, on a native linux platform.
Let me know if any questions. Thanks!
@venkate5hgunda Is it OK to merge the code to master? Let me know if any issues.
@venkate5hgunda Is it OK to merge the code to master? Let me know if any issues.
Yes, Professor. Upon Hardware Profiling of both methods, we're seeing a significant performance improvement and my results seems to align with your observations.
Implementation 1 (The old implementation): UDF-based selection; All trees are encoded in one UDF.
Implementation 2 (The new implementation): Cross-product-based implementation. Each thread is responsible for a subset of trees. Each sample is sent to each thread for prediction. Then, prediction results will be aggregated.