Closed prharvey closed 6 years ago
raw pointers
Before: 194ms ± 1% After: 167ms ± 1% Change: -14.19% (p=0.000 n=19)
std::shared_ptr
Before: 532ms ± 1% After: 261ms ± 2% Change: -51.01% (p=0.000 n=20)
@prharvey Have you noticed that you basically rewrote the whole thing? You have changed the code style (is there one true standard for C++ now like PEP8 for Python?), the variable names (e.g. what is exactly better in root_
over mRoot
or input
over orig
?), and dropped shared_ptr
solution down to the raw pointers, thus the idea behind the use of shared_ptr
was lost completely. I hope the author of the original C++ solution @supermina999 will give more comments on this, but I am a bit confused by this PR.
@prharvey Have you noticed that you basically rewrote the whole thing?
Yep, see title.
You have changed the code style (is there one true standard for C++ now like PEP8 for Python?)
I used clang-format, with my settings tuned for the Google Style Guide. There isn't a canonical style, but I personally prefer this one and what I had my editor set up for.
the variable names (e.g. what is exactly better in root_ over mRoot or input over orig?)
See the style guide above. Additionally, I removed abbreviation and renamed obscure variables (e.g. y -> priority, x -> value)
and dropped shared_ptr solution down to the raw pointers, thus the idea behind the use of shared_ptr was lost completely.
The shared_ptr version is still using shared_ptr to manage the lifetime of the object. That is the whole point of this exercise, is it not? The other usage of pointers isn't as a lifetime guarantee, but an indirection. They could equivalently be replaced with references - again I just prefer pointers for this use case since they can be assigned to if needed.
Well, I am fine with the raw pointers version as is is not the part of the main "naive" scoreboard, and I can accept any tricks to the tunned scoreboard. However, the shared_ptr
version is much beyond "naive" approach. I didn't completely like the original version as it didn't follow the API other languages did (they don't use mutable parameters in split/merge, instead, they return pairs/tuples), but this update to it has no point as it is not "naive" anymore and it is not fast enough to include it to the tunned scoreboard.
By naive I mean careless yet safe assignments, something that you would do in Python / Kotlin / JS / ...
I just prefer pointers for this use case since they can be assigned to if needed.
I am not a C++ expert at all, but my common sense and all the people out there keep saying: "Avoid pointers until you can't."
I am curious to know what is the root cause of the speed up for the raw pointers version; do you happen to have any insights? Is it void merge
vs NodePtr merge
or pointers VS references or pointer operations around reinterpret_cast<Node*>...
?
The updated raw pointers version manages to squeeze into 0.183s (the original version takes 0.208s).
Please, let's only update the raw pointers version and leave the "naive" shared_ptr
leave its own life.
I was under the impression the "naive" part of this benchmark was the algorithm implementation (i.e. doing a split and then join is not how one would normally implement these operations), not the coding style. With that in mind it's best to implement the same algorithm using each language's canonical style, as compiler implementors tend to optimize for the common use case - but I'm not going to push the issue.
I am not a C++ expert at all, but my common sense and all the people out there keep saying: "Avoid pointers until you can't."
Pointers are great. People are wary of them because they don't understand that memory management and pointers are orthogonal. When you realize that all pointers truly are is an indirection, they seem a lot less scary.
Anyways, the main speedup in the raw pointer version is due to making Merge eligible for tail call optimization. The rest of the changes around that help, but are small in comparison.
Finally, I updated the shared_ptr version to mimic the Java version. It isn't as fast as what I had before, however.
Before: 536ms ± 3% After: 351ms ± 3% Change: -34.44% (p=0.000 n=20)
doing a split and then join is not how one would normally implement these operations
Well, if you optimize the algorithm, that would be a completely different benchmark. To benchmark languages, we should implement the same algorithm. If we would argue that it is not the optimal algorithm I would say that given we searched nothing valuable and output a constant number, we can optimize the whole program to output that single number. I look forward to see smart compilers that would always optimize such benchmarks to nothing and optimize the natively written implementations down to the tuned ones.
With that in mind it's best to implement the same algorithm using each language's canonical style, as compiler implementors tend to optimize for the common use case
I agree with that, but that would be quite different levels of naivety for C++ and, for example, Python. I am still not sure if shared_ptr implementation is naive enough, but I just think that there is no way to implement it as naively as in JS or Python.
Finally, I updated the shared_ptr version to mimic the Java version. It isn't as fast as what I had before, however.
Thanks! I will take a look later today.
Pointers are great. People are wary of them because they don't understand that memory management and pointers are orthogonal. When you realize that all pointers truly are is an indirection, they seem a lot less scary.
Pointers are scary for reason that in a big application it is hard to reason about them. Even when you "only" use them for indirection, you may accidentally slip a pointer to some function which will have some side effects and there is no way to ensure that it is only used for indirection at compile time. Anyway, that is an off topic.
Swapped out the PRNG. 162ms ± 1%
FYI, we currently have an internal discussion about the "naivety" of certain solutions. I will merge your changes once we decide how to proceed. We are considering to move the "raw pointers" implementations in C++, Ada, and Pascal into the naive scoreboard.
I have merged the shared_ptr
solution manually in the latest commit: https://github.com/frol/completely-unscientific-benchmarks/commit/7bd3f562e45d48cab883f83a09420ec81a3bd278.
The "raw" version in this PR doesn't show any difference with our updated solution, which only touched the merge
signature. We now have two "raw" versions in the scoreboards - the original one and the tuned one.
Also, there was added a new version which uses boost::object_pool<Node>
and manages to squeeze the absolute maximum performance at the moment.
Thank you for your contribution!
Optimize the C++ raw pointer version and clean up the code.