unordered_map is a hash table. Hash tables are fast; they have O(1) insertion and access time. map is a balanced binary tree; these are slower with log(N) insertion and access times. The only downside of a hash table is that it can take more space than a balanced binary tree, though it doesn't always.
I test this PR by putting return 0 in main.cpp, like so:
And then measure runtime on a dataset with 56,977,804 edges using /usr/bin/time -v.
The existing code has
User time (seconds): 92.25
Maximum resident set size (kbytes): 9907208
while the new code has
User time (seconds): 41.86
Maximum resident set size (kbytes): 8471668
So we see that using unordered_map gets the data loaded in less than half the time while using less space!
Note that the C++ STL hash table is generally a slow implementation since it uses buckets instead of linear probing. See for instance this SO answer and also this performance comparison. Still, a 2x speed-up without any additional dependencies is pretty decent.
unordered_map
is a hash table. Hash tables are fast; they have O(1) insertion and access time.map
is a balanced binary tree; these are slower withlog(N)
insertion and access times. The only downside of a hash table is that it can take more space than a balanced binary tree, though it doesn't always.I test this PR by putting
return 0
inmain.cpp
, like so:And then measure runtime on a dataset with 56,977,804 edges using
/usr/bin/time -v
.The existing code has
while the new code has
So we see that using
unordered_map
gets the data loaded in less than half the time while using less space!Note that the C++ STL hash table is generally a slow implementation since it uses buckets instead of linear probing. See for instance this SO answer and also this performance comparison. Still, a 2x speed-up without any additional dependencies is pretty decent.