C++ Boost block_indirect_sort

gitboy16 commented 2 years ago

Hi, Is the purpose of this repo to compare the best sorting algo in each language? If so, would you be able to add C++ boost block_indirect_sort to you benchmark? It is multi-threaded as well. Thank you Best regards

LilithHafner commented 2 years ago

I use the default sorting algorithm for each language. Folks who use Python and want performant code typically use Numpy arrays rather than Python lists, so I included python/numpy as an additional system to benchmark.

C++/boost seems like another system that would be reasonable to include. I'd be happy to review a pull request adding it.

My benchmark machine has two physical cores and four logical cores, so it is possible multithreading could result in a speedup. For comparison, sorting in Julia 1.9 operates on a single thread.

According to a benchmark without source code provided, this algorithm is 4x faster than the default sorting algorithm for 10^6 elements. That would place it between Julia-1.7 and Julia-1.9.

gitboy16 commented 2 years ago

I don't think I understand your benchmark. It seems you are measuring a lot more than just the time it takes to sort a vector. I tested the sorting algorithm of Julia 1.9 vs Boost on large data (i.e. above 1e8, I usually work with ~1e9 data which are floating output of models between 0 and 1) and it was behind in single threaded mode. I use boost in multithreading mode because waiting 30 to 40 seconds to sort the data is too slow.

LilithHafner commented 2 years ago

I don't think I understand your benchmark. It seems you are measuring a lot more than just the time it takes to sort a vector.

I explain this in the README:

I compute the difference between runtime with and without sorting to infer the sorting runtime. For high performing languages on this benchmark, the vast majority of the time is spent sorting (typically 80-90%).

it was behind in single threaded mode

I'd be happy to add your results to the graph on the README if you make a PR to this repository.

gitboy16 commented 2 years ago

I have tested on Julia 1.9 commit 799a2cfab5 vs Rcpp. There is probably an overhead of executing c++ via R 3.6.1 (but it is so nice, easily and reproducible that it is worth the cost)

x = rand(1_000);
@btime sort(x)

The code for Rcpp is the following

// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
#include <boost/sort/sort.hpp>

using namespace Rcpp;
// [[Rcpp::export]]
NumericVector boostsort(NumericVector x, uint32_t con_threads) {
  NumericVector y = clone(x);
  boost::sort::block_indirect_sort(y.begin(), y.end(), con_threads);
  return y;
}

Note that above I am cloning the input vector so it is an additional step I am measuring as well in my timing of boost sort.

To run the above in R:

x = runif(1e3)
Rcpp::sourceCpp("boost_sort.cpp")
microbenchmark::microbenchmark(boostsort(x,1L), times =5L)

I have tested for the following size: 1e6, 1e7, 1e8 Julia timings: 53.3 ms, 596 ms, 5.9 s C++ timings: 35.5 ms, 410 ms, 4.77 s

Personally I am more interested in multi-threaded sorting on really large data and that is where boost shines. Fyi, I have run the above on a desktop, and couldn't do 1e9 because not enough memory for that. I couldn't test on the server I use because I am not allowed to install Julia there. But at least now you have the code to benchmark! (The c++ version could probably be improved if done in pure c++)

LilithHafner commented 2 years ago

Thanks! Running c++ through R is an interesting approach to achieving reproducibility.

I got three errors in R which I solved by using

install.packages("Rcpp")
install.packages("BH")
install.packages("microbenchmark")

Once I got past these three errors, I got this many-page error message:

``` > Rcpp::sourceCpp("boost_sort.cpp") In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:308: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/math.h:309: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/type_traits:417: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cstddef:37: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:308: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/math.h:309: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/type_traits:417: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cstddef:49:9: error: no member named 'ptrdiff_t' in the global namespace using ::ptrdiff_t; ~~^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:308: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/math.h:309: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/type_traits:418: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:308: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/math.h:310: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/limits:121: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:100: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/cmath:309: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:668: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/typeinfo:60: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/exception:83: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:671: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/new:93: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:672: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/utility:205: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:674: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/iterator:423: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:674: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/iterator:562:13: error: unknown type name 'ptrdiff_t' typedef ptrdiff_t difference_type; ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/iterator:623:56: error: unknown type name 'ptrdiff_t' template, class _Distance = ptrdiff_t> ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/iterator:989:54: error: template parameter missing a default argument class _Traits = char_traits<_CharT>, class _Distance = ptrdiff_t> ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/iterator:989:27: note: previous default template argument defined here class _Traits = char_traits<_CharT>, class _Distance = ptrdiff_t> ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:677: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/tuple:154: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:681: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/atomic:576: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__threading_support:14: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/chrono:831: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:681: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/atomic:581: /Users/x/.julia/dev/julia_dm/version:1:1: error: expected unqualified-id 1.9.0-DEV ^ In file included from boost_sort.cpp:2: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp.h:27: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/RcppCommon.h:30: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/r/headers.h:62: In file included from /Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include/Rcpp/platform/compiler.h:153: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/unordered_map:411: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/__hash_table:15: In file included from /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/memory:681: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/atomic:1052:73: error: unknown type name 'ptrdiff_t' _Tp* __cxx_atomic_fetch_add(__cxx_atomic_base_impl<_Tp*> volatile* __a, ptrdiff_t __delta, memory_order __order) _NOEXCEPT { ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/atomic:1057:65: error: unknown type name 'ptrdiff_t' _Tp* __cxx_atomic_fetch_add(__cxx_atomic_base_impl<_Tp*> * __a, ptrdiff_t __delta, memory_order __order) _NOEXCEPT { ^ /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/v1/atomic:1073:73: error: unknown type name 'ptrdiff_t' _Tp* __cxx_atomic_fetch_sub(__cxx_atomic_base_impl<_Tp*> volatile* __a, ptrdiff_t __delta, memory_order __order) _NOEXCEPT { ^ fatal error: too many errors emitted, stopping now [-ferror-limit=] 20 errors generated. make: *** [boost_sort.o] Error 1 clang++ -mmacosx-version-min=10.13 -std=gnu++14 -I"/Library/Frameworks/R.framework/Resources/include" -DNDEBUG -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/Rcpp/include" -I"/Library/Frameworks/R.framework/Versions/4.1/Resources/library/BH/include" -I"/Users/x/.julia/dev/julia_dm" -I/usr/local/include -fPIC -Wall -g -O2 -c boost_sort.cpp -o boost_sort.o Error in Rcpp::sourceCpp("boost_sort.cpp") : Error 1 occurred building shared library. ```

Which I got past by running this all in a directory that contains no files except for boost_sort.cpp.

Now I am stuck on a third class of error

> x = runif(1e3)
> Rcpp::sourceCpp("boost_sort.cpp")
> microbenchmark::microbenchmark(boost_sort(x,1L), times =5L)
Error in boost_sort(x, 1L) : could not find function "boost_sort"

Beyond the fact that I can't reproduce this (yet), there is the issue that this sorting does not do real work (i.e. a smart enough compiler could realize that it doesn't actually matter whether or not the sort takes place) and the issue that this repeatedly sorts the same vector. Hopefully both of these issues are minor, but in the interest of fairness, I take a slightly more robust approach for the benchmarks in this repo (see README.md for details).

gitboy16 commented 2 years ago

Sorry, please replace boost_sort by boostsort in the R part. I have edited my comment above. From the code I read in your repo I don't think your approach is fair because you are measuring more than the time it takes to sort a vector (for example you are measuring the time it takes to create the vector and this can varie a lot from one language to anoyher)...but that is just my view...

gitboy16 commented 2 years ago

Manage to time both algo with 1e9 of data. Julia took 57.41s vs Boost 56.78s which is pretty impressive for a dynamic language. I noticed however a higher pick in memory consumption on the Julia side. Using 8 threads, Boost takes 15.30s. For your info, we have an in-house algo that takes 7.21s and we think there is room for improvement.

LilithHafner commented 2 years ago

As I explain in the readme, I measure the time with sorting + other things and the time with just the other things. I subtract the two to get a figure for sorting time. This method takes into account the total impact of a sorting operation on the execution of a program. For example, a fast sorting algorithm that has a large garbage collection impact will be accurately penalized with my benchmarking strategy, whereas standard practice for microbenchmarks may mask this cost by running the garbage collector between samples.

Further, for fast languages (Jula and Java were the two I was looking at when I made this determination, but it should probably also hold for C++), sorting takes substantially more time than generating the vectors. This is because it is (theoretically) O(n log n) vs O(n). IIRC, 80-90% of time is spent sorting, but I compensate for that 10-20% overcount fairly well by subtracting out the runtime of everything but sorting.

As I note in the code, for C++ this subtraction results in a strange bug (PR welcome if you can identify how to fix it), so rather than subtracting 10-20% as I do for Julia, I simply subtract 15%. This would be meaningfully unfair if C++ is substantially slower than Julia at generating random numbers. I hope this is not the case.

I have finally been able to reproduce your benchmarks! Thanks for your help. Interestinglly, I get different results on my computer:

julia> for i in 1:8
           x = rand(10^i)
           @btime sort(x)
       end
  106.970 ns (1 allocation: 144 bytes)
  2.549 μs (1 allocation: 896 bytes)
  18.995 μs (4 allocations: 18.09 KiB)
  199.390 μs (6 allocations: 164.56 KiB)
  2.141 ms (6 allocations: 1.53 MiB)
  28.287 ms (6 allocations: 15.27 MiB)
  317.828 ms (6 allocations: 152.60 MiB)
  4.349 s (6 allocations: 1.49 GiB)

> Rcpp::sourceCpp("boost_sort.cpp")
> for (i in 1:8) {
+   x = runif(10^i)
+   print(microbenchmark::microbenchmark(boostsort(x,1L), times =5L))
+ }
Unit: microseconds
             expr   min    lq    mean median    uq     max neval
 boostsort(x, 1L) 2.581 2.856 29.0578  2.869 8.289 128.694     5
Unit: microseconds
             expr   min   lq    mean median    uq    max neval
 boostsort(x, 1L) 4.233 4.72 10.5894  4.747 6.346 32.901     5
Unit: microseconds
             expr    min     lq     mean median     uq      max neval
 boostsort(x, 1L) 63.525 72.491 327.7388 84.772 150.79 1267.116     5
Unit: microseconds
             expr     min      lq    mean  median      uq     max neval
 boostsort(x, 1L) 511.313 519.277 538.975 522.803 564.231 577.251     5
Unit: milliseconds
             expr      min       lq     mean   median       uq      max neval
 boostsort(x, 1L) 5.105957 5.261282 5.896635 5.611892 6.207899 7.296144     5
Unit: milliseconds
             expr      min       lq     mean   median       uq      max neval
 boostsort(x, 1L) 46.73087 48.50289 56.96083 51.99128 65.22243 72.35669     5
Unit: milliseconds
             expr      min       lq     mean   median       uq      max neval
 boostsort(x, 1L) 508.3247 538.7383 538.4224 539.5442 547.0991 558.4059     5
Unit: seconds
             expr      min       lq     mean   median      uq      max neval
 boostsort(x, 1L) 8.484083 9.013505 9.442578 9.732026 9.95213 10.03114     5

As an aside, what bitwidiths are you using? I can't fit 10^9 Float64s in ram on my computer, but sorting 10^9 Float16s takes me 3.8 seconds on Julia c904ef372c81530ffbde63677010f1aa39018ed5. If you are willing to work with Float16s, they are quite easy to sort at scale. Julia uses algorithms from the 1950s that are easily parallelized, but still only runs them on a single thread. There is certainly room for improvement.

gitboy16 commented 2 years ago

I am using float64. Very strange that benchmarks are so different 🤔. I would love to see Julia running that fast vs c++ boost on my computer!

gitboy16 commented 2 years ago

Tried to run your benchmark again but don't see such a difference on my desktop, actually boost is faster on my machine for i = 6:8. Which versiin of r and rtools are you using? In your benchmark above, I noticed that Julia 1.8rc3 is slightly faster than 1.9 for i=1:3.

LilithHafner commented 2 years ago

It seems plausible to me that it is due to hardware differences. What type of hardware are you running?

Here's my latest session which includes my exact methodology, Julia and R versions, as well as detailed hardware info and another replication of the benchmarks.

``` x@X ~ % mkdir temp x@X ~ % cd temp x@X temp % ls x@X temp % ls -a . .. x@X temp % nano boost_sort.cpp x@X temp % cat boost_sort.cpp // [[Rcpp::depends(BH)]] #include #include using namespace Rcpp; // [[Rcpp::export]] NumericVector boostsort(NumericVector x, uint32_t con_threads) { NumericVector y = clone(x); boost::sort::block_indirect_sort(y.begin(), y.end(), con_threads); return y; } x@X temp % ls boost_sort.cpp x@X temp % R R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid" Copyright (C) 2022 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin17.0 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > version _ platform x86_64-apple-darwin17.0 arch x86_64 os darwin17.0 system x86_64, darwin17.0 status major 4 minor 2.1 year 2022 month 06 day 23 svn rev 82513 language R version.string R version 4.2.1 (2022-06-23) nickname Funny-Looking Kid > Rcpp::sourceCpp("boost_sort.cpp") Error in loadNamespace(x) : there is no package called ‘Rcpp’ > install.packages("Rcpp") --- Please select a CRAN mirror for use in this session --- trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/contrib/4.2/Rcpp_1.0.9.tgz' Content type 'application/x-gzip' length 3277777 bytes (3.1 MB) ================================================== downloaded 3.1 MB The downloaded binary packages are in /var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T//RtmpwRVjWz/downloaded_packages > install.packages("BH") trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/contrib/4.2/BH_1.78.0-0.tgz' Content type 'application/x-gzip' length 12587148 bytes (12.0 MB) ================================================== downloaded 12.0 MB The downloaded binary packages are in /var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T//RtmpwRVjWz/downloaded_packages > install.packages("microbenchmark") trying URL 'https://mirror.las.iastate.edu/CRAN/bin/macosx/contrib/4.2/microbenchmark_1.4.9.tgz' Content type 'application/x-gzip' length 64504 bytes (62 KB) ================================================== downloaded 62 KB The downloaded binary packages are in /var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T//RtmpwRVjWz/downloaded_packages > Rcpp::sourceCpp("boost_sort.cpp") > for (i in 1:8) { + x = runif(10^i) + print(microbenchmark::microbenchmark(boostsort(x,1L), times =5L)) + } Unit: microseconds expr min lq mean median uq max neval boostsort(x, 1L) 4.358 5.795 478.4392 6.382 50.145 2325.516 5 Unit: microseconds expr min lq mean median uq max neval boostsort(x, 1L) 6.709 6.998 15.2882 9.307 9.523 43.904 5 Unit: microseconds expr min lq mean median uq max neval boostsort(x, 1L) 46.565 48.724 57.4932 51.821 62.136 78.22 5 Unit: microseconds expr min lq mean median uq max neval boostsort(x, 1L) 445.299 447.431 501.6868 470.787 516.96 627.957 5 Unit: milliseconds expr min lq mean median uq max neval boostsort(x, 1L) 4.475907 4.698556 4.916427 5.0116 5.16702 5.229052 5 Unit: milliseconds expr min lq mean median uq max neval boostsort(x, 1L) 43.70694 44.13285 45.42506 44.27885 44.69978 50.30688 5 Unit: milliseconds expr min lq mean median uq max neval boostsort(x, 1L) 474.7842 481.2439 518.5524 485.6847 560.5844 590.4651 5 Unit: seconds expr min lq mean median uq max neval boostsort(x, 1L) 5.589933 5.615741 6.338187 5.897934 6.448411 8.138915 5 > Save workspace image? [y/n/c]: n x@X temp % ls boost_sort.cpp x@X temp % cat boost_sort.cpp // [[Rcpp::depends(BH)]] #include #include using namespace Rcpp; // [[Rcpp::export]] NumericVector boostsort(NumericVector x, uint32_t con_threads) { NumericVector y = clone(x); boost::sort::block_indirect_sort(y.begin(), y.end(), con_threads); return y; } x@X temp % julia-1.9 _ _ _ _(_)_ | Documentation: https://docs.julialang.org (_) | (_) (_) | _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help. | | | | | | |/ _` | | | | |_| | | | (_| | | Version 1.9.0-DEV.1053 (2022-07-26) _/ |\__'_|_|_|\__'_| | Commit 9e22e567f29 (0 days old master) |__/ | julia> versioninfo() Julia Version 1.9.0-DEV.1053 Commit 9e22e567f29 (2022-07-26 14:21 UTC) Platform Info: OS: macOS (x86_64-apple-darwin21.5.0) CPU: 4 × Intel(R) Core(TM) i5-8210Y CPU @ 1.60GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-14.0.5 (ORCJIT, skylake) Threads: 1 on 4 virtual cores Environment: LD_LIBRARY_PATH = /usr/local/lib julia> using BenchmarkTools [ Info: Precompiling BenchmarkTools [6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf] julia> for i in 1:8 x = rand(10^i) @btime sort(x) end ERROR: UndefVarError: x not defined Stacktrace: [1] var"##core#228"() @ Main ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:489 [2] var"##sample#229"(::Tuple{}, __params::BenchmarkTools.Parameters) @ Main ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:495 [3] _run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Pairs{Symbol, Integer, NTuple{4, Symbol}, NamedTuple{(:samples, :evals, :gctrial, :gcsample), Tuple{Int64, Int64, Bool, Bool}}}) @ BenchmarkTools ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:99 [4] #invokelatest#2 @ ./essentials.jl:810 [inlined] [5] #run_result#45 @ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:34 [inlined] [6] run(b::BenchmarkTools.Benchmark, p::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Pairs{Symbol, Integer, NTuple{5, Symbol}, NamedTuple{(:verbose, :samples, :evals, :gctrial, :gcsample), Tuple{Bool, Int64, Int64, Bool, Bool}}}) @ BenchmarkTools ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:117 [7] #warmup#54 @ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:169 [inlined] [8] warmup(item::BenchmarkTools.Benchmark) @ BenchmarkTools ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:168 [9] macro expansion @ ~/.julia/packages/BenchmarkTools/7xSXH/src/execution.jl:575 [inlined] [10] top-level scope @ REPL[3]:3 julia> for i in 1:8 x = rand(10^i) @btime sort($x) end 93.431 ns (1 allocation: 144 bytes) 2.262 μs (1 allocation: 896 bytes) 19.211 μs (4 allocations: 18.09 KiB) 192.222 μs (6 allocations: 164.56 KiB) 2.018 ms (6 allocations: 1.53 MiB) 26.182 ms (6 allocations: 15.27 MiB) 293.799 ms (6 allocations: 152.60 MiB) 3.701 s (6 allocations: 1.49 GiB) julia> x@X temp % system_profiler SPSoftwareDataType SPHardwareDataType Software: System Software Overview: System Version: macOS 12.2.1 (21D62) Kernel Version: Darwin 21.3.0 Boot Volume: APPLE SSD AP0256N Media Boot Mode: Normal Computer Name: X User Name: Lilith Hafner (x) Secure Virtual Memory: Enabled System Integrity Protection: Enabled Time since boot: 15 days 16:05 Hardware: Hardware Overview: Model Name: MacBook Air Model Identifier: MacBookAir8,2 Processor Name: Dual-Core Intel Core i5 Processor Speed: 1.6 GHz Number of Processors: 1 Total Number of Cores: 2 L2 Cache (per Core): 256 KB L3 Cache: 4 MB Hyper-Threading Technology: Enabled Memory: 8 GB System Firmware Version: 1715.81.2.0.0 (iBridge: 19.16.10744.0.0,0) OS Loader Version: 540.80.2~11 ```

Which versiin of r and rtools are you using?

I recently got into a spat with my AI assistant which ran rm -rf /, so I don't know what version of R and Rtools I was previously using, but I redownloaded R and the relevant libraries today and the performance gap remains (version info included in the provided session, should be latest).

gitboy16 commented 2 years ago

I am running on windows 10. Julia version 1.9 dev 1022 commit 799a2cfab5. CPU: 8 x Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz. Could it be that your version of Julia includes improvement that are not included in mine? I have tried R 4.2.1 and it is actually a bit faster that R 3.6.1.

LilithHafner commented 2 years ago

Interesting. Your clock speed is twice mine, and yet Julia runs twice as fast on my computer. It may be a performance issue in the interaction between Julia's algorithms and your hardware. Your Julia version is 6 days old and nothing that could explain this 4x discrepancy has changed in the last 6 days.

CPU info is helpful, but comprehensive system specs would be nice. e.g. what I posted above:

      Model Name: MacBook Air
      Model Identifier: MacBookAir8,2
      Processor Name: Dual-Core Intel Core i5
      Processor Speed: 1.6 GHz
      Number of Processors: 1
      Total Number of Cores: 2
      L2 Cache (per Core): 256 KB
      L3 Cache: 4 MB
      Hyper-Threading Technology: Enabled
      Memory: 8 GB
      System Firmware Version: 1715.81.2.0.0 (iBridge: 19.16.10744.0.0,0)
      OS Loader Version: 540.80.2~11

Another option (probably better) would be to use SystemBenchmarks.jl to attempt to capture general interactions between Julia and your machine. My results are here: https://github.com/IanButterworth/SystemBenchmark.jl/issues/8#issuecomment-1197475145 and instructions for generating them are at the top of the thread.

gitboy16 commented 2 years ago

Please see attached. Note I can directly log in to github with this machine, so i am uploading a photo. 20220728_085844

gitboy16 commented 2 years ago

And while I am at it, i attach the benchmark with R and Julia. 20220728_091402 20220728_091429

LilithHafner commented 2 years ago

An initial guess I have is that your machine is about 2x faster in most ways, but mine has 2x faster 100kB memory bandwidth which becomes a bottleneck on your computer.

I'll look for a fast computer with slow 100kB memory access to test with, but without being able to profile and fiddle, I don't think I'll be able to diagnose this performance issue remotely.

You could try fiddling around with chunk size. For example, the default chunk size of heuristic results in a 10 bit chunk and a 8kB counts vector at most of these sizes which may be too large for your machine. You can override these defaults like so:

Base.Sort.radix_chunk_size_heuristic(::Integer, ::Integer, ::Unsigned) = UInt8(9)

gitboy16 commented 2 years ago

Hi, would you consider porting julia sorting algorithm to a new package and adding multi-threading to it? (Not sure you can add this directly to core Julia?)

LilithHafner commented 2 years ago

To make a good multithreaded sorting algorithm I'd need easy access to a many-cored machine to develop and benchmark on. Unfortunately, that is not something I have right now, and I am prioritizing other improvments.

LilithHafner commented 2 years ago

gitboy16 commented 1 year ago

Hi, is there a way to extract the sort function (and algorithms) from base Julia in order to improve them and build a multi-threaded version? Also if I want to fiddle with chunck size it is not easy to do it in base Julia.

LilithHafner commented 1 year ago

Julia's sorting functions and algorithms live at https://github.com/JuliaLang/julia/blob/master/base/sort.jl. You can experiment with redefining sorting internals by redefining methods. For example, this is how I would make Base.Sort's radix sort use a chunk size of 2 bits:

Base.Sort.radix_chunk_size_heuristic(lo::Integer, hi::Integer, bits::Unsigned) = 0x02

Here is a more complex example that reveals a weakness in the existing heuristic:

julia> using BenchmarkTools

julia> @btime sort!(rand(10000));
  94.583 μs (5 allocations: 164.53 KiB)

julia> const CHUNK_SIZE = Ref(0x00);

julia> Base.Sort.radix_chunk_size_heuristic(lo::Integer, hi::Integer, bits::Unsigned) = CHUNK_SIZE[]

julia> for i in 1:20
           CHUNK_SIZE[] = i
           print(lpad(i,2)); @btime sort!(rand(10000));
       end
 1  1.987 ms (5 allocations: 156.42 KiB)
 2  547.792 μs (5 allocations: 156.44 KiB)
 3  322.750 μs (5 allocations: 156.47 KiB)
 4  219.750 μs (5 allocations: 156.53 KiB)
 5  201.000 μs (5 allocations: 156.67 KiB)
 6  169.333 μs (5 allocations: 156.91 KiB)
 7  114.250 μs (5 allocations: 157.41 KiB)
 8  101.209 μs (5 allocations: 158.53 KiB)
 9  132.375 μs (5 allocations: 160.53 KiB)
10  93.875 μs (5 allocations: 164.53 KiB)
11  123.750 μs (6 allocations: 172.45 KiB)
12  91.417 μs (6 allocations: 188.45 KiB)
13  125.542 μs (6 allocations: 220.45 KiB)
14  123.167 μs (6 allocations: 284.45 KiB)
15  171.000 μs (6 allocations: 412.45 KiB)
16  244.000 μs (6 allocations: 668.45 KiB)
17  384.250 μs (6 allocations: 1.15 MiB)
18  670.958 μs (6 allocations: 2.15 MiB)
19  1.010 ms (6 allocations: 4.15 MiB)
20  2.072 ms (6 allocations: 8.15 MiB)

Apparently, a chunk size of 12 is about 3% faster than a chunk size of 10 in this case.

gitboy16 commented 1 year ago

I have slightly changed your code to always sort the same random vector.

x = rand(10^8);
for i in 1:20
  CHUNK_SIZE[] = i
  print(lpad(i,2)); 
  @btime sort($x);
end

This is the results I get. 20230420_091758

LilithHafner commented 1 year ago

It seems that on your machine a chunk size of 10 is 1.6x faster than a chunk size of 12. Looking at these benchmark results, the default chunk size of 10 in this case seems like a great choice for both of our machines :).

Aside: please try to share text rather than physical pictures of a screen.

gitboy16 commented 1 year ago

Sorry for that, but I don't have access to github from that machine and it is a bit of a pain to type everything by hand. I have restricted internet connection.

LilithHafner / InterLanguageSortingComparisons

C++ Boost block_indirect_sort #1