The benchmark on this page shows xgboost histogram on Bosch, but it is slower than a single thread of i7-7700HQ throttled to 2.8 GHz (which trains with even more data).
Number of threads should be set to a low number (such as 8) and not a huge number (p3.16xlarge uses 64 threads), it is probably a negative scaling.
The benchmark on this page shows xgboost histogram on Bosch, but it is slower than a single thread of i7-7700HQ throttled to 2.8 GHz (which trains with even more data).
Number of threads should be set to a low number (such as 8) and not a huge number (p3.16xlarge uses 64 threads), it is probably a negative scaling.