Open lrineau opened 5 years ago
https://github.com/abseil/abseil-cpp license is Apache License 2.0. It is very permissive.
https://martin.ankerl.com/2019/04/01/hashmap-benchmarks-01-overview/ had some rather detailed comparisons recently, which seem to yield similar observations.
Shall @maxGimeno propose a global plan with the introduction of CGAL::unordered_map
+ the cmake machinery to get it everywhere?
abseil is rather inconvenient to use (without bazel or cmake): you need to link about 30 libraries to get a trivial example working :disappointed: But yes, anything that spends a lot of time in hashtable operations should be benchmarked with other implementations. Note that in one extreme case (unordered_set was taking > 64G in this application) I had better results with the standard container than with abseil.
@mglisse commented on Jul 4, 2019, 9:14 AM GMT+2:
abseil is rather inconvenient to use (without bazel or cmake): you need to link about 30 libraries to get a trivial example working :disappointed:
Hopefully CGAL uses CMake.
add_subdirectory(googletest)
add_subdirectory(abseil-cpp)
target_link_libraries(copy_polyhedron PRIVATE absl::flat_hash_map)
But yes, anything that spends a lot of time in hashtable operations should be benchmarked with other implementations. Note that in one extreme case (unordered_set was taking > 64G in this application) I had better results with the standard container than with abseil.
Our opting for Abseil will be conditional to a macro. With it we will easily be able to build benchmarks. Even users should be made aware of that, so that they can tune.
Hopefully CGAL uses CMake.
As a header-only library, it doesn't matter what CGAL uses (if it uses anything at all). Providing CGALConfig.cmake does help. But users of CGAL don't all use cmake.
@mglisse commented on Jul 4, 2019, 9:38 AM GMT+2:
Hopefully CGAL uses CMake.
As a header-only library, it doesn't matter what CGAL uses (if it uses anything at all). Providing CGALConfig.cmake does help. But users of CGAL don't all use cmake.
We do not care. Abseil will be an optional dependency of CGAL, and our CMake users will get it almost for free. As for the others... they can do experiments and benchmarks using CMake, and them make the effort in their build system if the speedup is compelling.
Actually, I do not get your point. Do you have a suggestion about an alternative?
Just my two cents, it would be good to see a higher level CGAL algorithm than copy_face_graph()
as a motivation.
@mglisse commented on Jul 4, 2019, 9:14 AM GMT+2:
abseil is rather inconvenient to use (without bazel or cmake): you need to link about 30 libraries to get a trivial example working :disappointed:
Actually, can you elaborate? Because on my machine abseil seems to require only the pthread support, and no other dependency.
@afabri it concerns everything related to unordered maps, not only copy_face_graph().
The questions is if it has measurable impact on Mesh_3, Constrained_triangulation_2, Arrangement_2......
We will probably have to bench to be sure. But I have seen traces of unordered maps at least in Mesh_3 and Arrangements_on_surface_2, so there will probbaly be an impact.
For at least one user of us, CGAL::PMP::polygon_soup_to_polygon_mesh
was a big consumer of time of their process. And it is all about maps.
Actually, can you elaborate? Because on my machine abseil seems to require only the pthread support, and no other dependency.
I didn't mean external dependencies. Using a trivial program:
#include <absl/container/flat_hash_set.h>
typedef absl::flat_hash_set<int> S;
int main(){ S s; }
Complete the command line with a minimal number of -l
flags:
g++ test.cc -I .../include -L .../lib
Here I need -labsl_hashtablez_sampler -labsl_synchronization -labsl_time -labsl_base -labsl_spinlock_wait -labsl_time_zone -labsl_int128 -labsl_symbolize -labsl_stacktrace -labsl_malloc_internal -labsl_demangle_internal -labsl_debugging_internal -labsl_dynamic_annotations -pthread
... Why they didn't at least create a thin archive or linker script so that -labsl
would work is an interesting question.
Actually, I do not get your point.
Just that abseil is a bit inconvenient to use, not as easy as using more Boost stuff for instance (even the parts that require linking with a library).
I don't know yet if you plan to test for the presence of abseil or to include a copy of it.
Do you have a suggestion about an alternative?
Not particularly. Mael posted a link to a comparison between many implementations, some of which are header-only. They don't all have Google maintaining them though, and you may wish to use more stuff from abseil later.
I would love to use the single-header implementation robin_hood.h
, that is said to be faster by those benchmarks, but it is not compatible, for the moment, with our use of -frounding-math
:
robin_hood.h:1901:59: error: ‘(8.0e+1 / 1.0e+2)’ is not a constant expression
1901 | static constexpr double factor = MaxLoadFactor100 / 100.0;
| ~~~~~~~~~~~~~~~~~^~~~~~~
@lrineau commented on Jul 4, 2019, 3:05 PM GMT+2:
I would love to use the single-header implementation
robin_hood.h
, that is said to be faster by those benchmarks, but it is not compatible, for the moment, with our use of-frounding-math
:robin_hood.h:1901:59: error: ‘(8.0e+1 / 1.0e+2)’ is not a constant expression 1901 | static constexpr double factor = MaxLoadFactor100 / 100.0; | ~~~~~~~~~~~~~~~~~^~~~~~~
I have artificially removed -frounding-math
, to compile the test, and it is slower than using abseil: 3.8 seconds instead of 3.2 using abseil.
our use of
-frounding-math
I don't remember if you tried recently to remove it, to see what breaks. But that's not the right place to discuss it.
I guess constexpr is a context where gcc should ignore rounding modes...
it is slower than using abseil
The list also includes phmap, which is described as essentially a headerified version of abseil.
Hi @lrineau, I've been experimenting with replacing the map in Unique_hash_map
with std::unordered_map
but initial results showed this to be slower than the chained_map
implementation. Probably I did something wrong or I am not understanding the use case of Unique_hash_map correctly:
https://gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23
Would this benefit from the abseil
version of map?
I've been experimenting with replacing the map in
Unique_hash_map
withstd::unordered_map
but initial results showed this to be slower than thechained_map
implementation.
The whole point of this issue is that std::unordered_map is slow, what did you expect?
@mglisse :joy: Yes, But my reason for investigating it (prior to coming across this PR) was that in some cases, I've found anything other than chained_map
to be faster, especially when the number of elements is small because chained_map
doesn't lazily initialise minimum of 512 buckets.
For example here: https://github.com/CGAL/cgal/blob/74af3e7d33e7354e4090a1b17941a3f0c372ca8d/Nef_S2/include/CGAL/Nef_S2/SM_const_decorator.h#L392
where visited only contains approximately 6 to 10 elements. The chained_map construction dominates the CPU usage, making O(1) lookup irrelevant probably a std::vector
with O(n) would be faster in this scenario. Another example where Unique_hash_map construction is slow is #5143
Hi @lrineau, I've been experimenting with replacing the map in
Unique_hash_map
withstd::unordered_map
but initial results showed this to be slower than thechained_map
implementation. Probably I did something wrong or I am not understanding the use case of Unique_hash_map correctly:gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23 (too large to embed)
Would this benefit from the
abseil
version of map?
Hi Giles, I do not have the time to investigate on your issue. I hope others will, though.
The best way to know if the Abseil hash maps can benefit your code is to try, anyway.
@lrineau Thanks I've given it a try
https://gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23
@mglisse Abseil was 20% slower than std:unordered_map
, for me though so I'm confused now.
@lrineau @mglisse I am getting good results with robin_hood.h
I use robin_hood::unordered_node_map
(The issue with -frounding math was resolved https://github.com/martinus/robin-hood-hashing/issues/51)
Updated gist: https://gist.github.com/GilesBathgate/3fb78d3949ca52f4d1a0d14dcc66ae23
where visited only contains approximately 6 to 10 elements
Did you try a flatmap for this one? (I know nothing about that code, and in particular not if the size of this container can ever be large) Such small maps are not a typical use case for hash maps, so many are probably not optimized for that case. The code you point to seems suboptimal, it calls visited[v]
twice in a row, changing the code a bit, a (hash)set would make more sense than a map (it probably used a map because there was no set in cgal), checking visited.insert(v).second
.
Nice if robin hood works well, although if you want to change Unique_hash_map, which is used a lot in CGAL, and not just use a different map in one specific place, reviewers are likely to ask for various benchmarks, to make sure it doesn't regress some other use cases (bigger table, more expensive key, etc). But it seems quite possible that it is indeed faster all around.
@mglisse Yes, I tried a boost::flat_set
and it was much faster I'll add a comment about that to #5379
Regarding Unique_hash_map
if I understand correctly it's optimised for when the keys are known to be unique and therefore no collisions. That said robin_hood does seem to out-perform chained_map in my tests, especially in the object construction. Either way, I've found some that some cases require robin_hood::unordered_node_map
instead of robin_hood::unordered_flat_map
So it doesn't get the cache friendliness benefits. I was thinking of asking @martinus if there is a way to use robin_hood::*
in a way that is optimal for unique hash maps.
If you know the maximum size of the hashmap, it is beneficial to call reserve()
with that amount. That way you only get a single allocation. Not much else you can do as far as I see on a quick glance.
Also, as @mglisse said, the linked code looks suboptimal. E.g. unordered_flat_set
might help, and I think the std::list could use a custom allocator so it doesn't malloc & free each time you put an element into it. A while ago I've created an allocator for that: https://gist.github.com/martinus/ee4cc9d82d4610647180301c47b030a2 see e.g. this benchmark: https://quick-bench.com/q/BLW3OTlodJ2c3TmQGJUPT_TAdGA
the std::list could
... be wrapped in a std::queue, which could then use its default backend deque :wink: (I didn't profile that code, no idea what part is relevant to optimize)
If you know the maximum size of the hashmap, it is beneficial to call
reserve()
with that amount.
So this could be done:
- CGAL::Unique_hash_map<SVertex_const_iterator,bool> visited(false);
+ robin_hood::unordered_flat_set<SVertex_const_iterator,Handle_hash_function> visited;
+ visited.reserve(number_of_svertices());
but unfortunately:
:disappointed:
@mglisse Presumably one can use return CGAL::iterator_distance(svertices_begin(), svertices_end());
(or just std::distance) in number_of_svertices
?
@mglisse Presumably one can use
return CGAL::iterator_distance(svertices_begin(), svertices_end());
(or just std::distance) innumber_of_svertices
?
Probably, but why? It would perform exactly the same iteration.
Probably, but why? It would perform exactly the same iteration.
Because I thought it was a random access iterator, so last-first
, it's not of course, so you are right.
I know nothing about that code, and in particular not if the size of this container can ever be large
@mglisse Yes it can be large if you have a cone with 10,000 sides, the sphere_map of the apex will have 10,000 svertices. so visited would need that many elements. robin_hood::unordered_flat_map
with .reserve does seem to be a good option though.
Issue Details
Inspired by one of the cppcon talks I listen to during my free time, I have quickly tried the hash maps from Abseil instead of
std::unordered_map
.My try was a modification of
copy_face_graph
:The results are stuning: now 3.2 seconds instead of 7.1 seconds
Source Code
The .cpp code: https://gist.github.com/2e9ea602a7b1e46cd34773611acac3c6
Here is the local patch to
CMakeLists.txt
(a non-clean hack):I had to checkout abseil and gtest:
I ran that modified
copy_polyhedron.cpp
ondemo/Polyhedron/data/man.off
, and as I said the speedup is huge: 3.2s after the patch to<CGAL/boost/graph/copy_face_graph.h>
, instead of 7.1s.Environment
-O3