franktakes / teexgraph

C++ library for large-scale network analysis and computation
GNU General Public License v3.0
24 stars 8 forks source link

Use unordered_map for input-internal mapping #6

Closed r-barnes closed 3 years ago

r-barnes commented 3 years ago

unordered_map is a hash table. Hash tables are fast; they have O(1) insertion and access time. map is a balanced binary tree; these are slower with log(N) insertion and access times. The only downside of a hash table is that it can take more space than a balanced binary tree, though it doesn't always.

I test this PR by putting return 0 in main.cpp, like so:

    Graph G;
    if(!G.loadUndirected(argv[1]))
        return 1;

    return 0;

And then measure runtime on a dataset with 56,977,804 edges using /usr/bin/time -v.

The existing code has

User time (seconds): 92.25
Maximum resident set size (kbytes): 9907208

while the new code has

User time (seconds): 41.86
Maximum resident set size (kbytes): 8471668

So we see that using unordered_map gets the data loaded in less than half the time while using less space!

Note that the C++ STL hash table is generally a slow implementation since it uses buckets instead of linear probing. See for instance this SO answer and also this performance comparison. Still, a 2x speed-up without any additional dependencies is pretty decent.

franktakes commented 3 years ago

Definitely a good one!