Tessil / ordered-map

C++ hash map and hash set which preserve the order of insertion
MIT License
512 stars 66 forks source link

`tsl::ordered_set<bool>` with `std::vector` #34

Open adamsol opened 3 years ago

adamsol commented 3 years ago

The readme says it's possible to use the containers with std::vector instead of std::deque. However, after changing that in ordered_set.h, the following code crashes under GCC 7.2.0, probably due to the strange specialization of std::vector<bool>.

#include <iostream>

#include "include/tsl/ordered_set.h"

int main() {
    tsl::ordered_set<bool> a;
    a.insert(true);

    tsl::ordered_set<bool> b;
    b.insert(a.begin(), a.end());

    std::cout << b.size() << std::endl;
}

There is also a warning:

lib/tsl/ordered_hash.h:387:43: warning: returning reference to temporary [-Wreturn-local-addr]
     reference operator*() const { return *m_iterator; }
                                           ^~~~~~~~~~

A workaround is to keep std::deque just for bool.

class ValueTypeContainer = typename std::conditional<
    std::is_same<Key, bool>::value,
    typename std::deque<Key, Allocator>,
    typename std::vector<Key, Allocator>>::type,

Is this the reason why std::deque is used by default? I've found that std::vector makes the containers somewhat faster (though they're much faster than the default unordered containers in STL anyway).

Tessil commented 3 years ago

The main reason to use std::deque as default is that it allows faster rehashes as we don't have to move all the values when growing the container.

Is there any special reason to use a tsl::ordered_set<bool> as the container will only be able to contain a maximum of two values and doesn't seem really useful at first glance? I suppose it's due to a templated code that use tsl::ordered_set?

adamsol commented 3 years ago

Sure, set<bool> isn't very useful. I'm trying to use this library as the implementation of sets and dictionaries in my programming language, and this crash appeared in my tests.

The performance may depend on what exactly we are testing. It seemed for me that std::vector should generally be faster than std::deque for random access and insertion at the end (https://thispointer.com/deque_vs_vector/). However, there are some benchmarks showing that it's not so clear: https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html, http://blog.davidecoppola.com/2014/05/cpp-benchmarks-vector-vs-list-vs-deque/. Additionally, std::deque behaves better in terms of iterator invalidation (references to values are not invalidated during the rehashing), so it may indeed be more beneficial.

Anyway, I guess this issue is more about the C++'s design flaw regarding std::vector<bool>, and the library works correctly, so feel free to close this. Though it might be worth adding some description in readme about why std::deque was chosen as the default and what are the possible consequences of changing it to std::vector.