dmlc / mshadow

Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning
Other
1.1k stars 431 forks source link

Disabling OpenMP parallel pragma for CPU tensors causes performance regression #187

Open alextnewman opened 7 years ago

alextnewman commented 7 years ago

The removal of OpenMP from this tensor_cpu_inl.h caused a massive performance regression for us on Windows (MSVC 2013), Mac (Clang), and Linux (gcc): https://github.com/dmlc/mshadow/pull/143/commits/f225763a439e988d1b804c1144b1bed3d194e12b

Locally, we've reverted this commit and gotten a tremendously positive result (20%+ improvement in training time), so it would be very helpful if there were some sort of option or flag we could use to enable OpenMP parallelization for this function without internal forking.

szha commented 4 years ago

This code base has been donated to the Apache MXNet project per #373, and repo is deprecated. Future development and issue tracking should continue in Apache MXNet.