Tessil / ordered-map

C++ hash map and hash set which preserve the order of insertion
MIT License
512 stars 66 forks source link

Eliminating the difference in what an iterator yields #32

Open ryanmolden opened 3 years ago

ryanmolden commented 3 years ago

From your documentation you have this:

For iterators, operator*() and operator->() return a reference and a pointer to const std::pair<Key, T> instead of std::pair<const Key, T> making the value T not modifiable. To modify the value you have to call the value() method of the iterator to get a mutable reference. Example:

Is this something that could be eliminated? I.e. the divergence between ordered_map and unordered_map? I have code that iterates over an unordered_map in parallel using std::for_each. I need to modify the T values but with ordered_map I can't do this because the callback provided to for_each receives the dereferenced iterator, i.e. I end up with a const std::pair<Key, T> and thus have no way of calling value to get a modifiable T.

It also pops up in using the ranged-for loops:

for(auto& foo : some_ordered_map)

foo here is again const std::pair<Key, T> and thus I can't possibly call value even if I wanted to, meaning I have to change all ranged-for loops to be the older, non-ranged style.

Tessil commented 3 years ago

Hello,

Compared to a std::unordered_map which stores std::pair<const Key, T> the tsl::ordered_map has to store std::pair<Key, T> to be able to support move-only Key. As the std::unordered_map stores the pairs inside a node it can easily move around the pointer node, on the other hand the tsl::ordered_map stores the pairs inside a std::vector/std::deque and setting the Key as const inside the pair would force a call to the copy-constructor of Key when the pair must be moved.

The possible design choices I thought of:

I picked-up the current solution as a compromise to avoid the undefined behaviour or error-prone solutions aforementioned. There may be a better solution that I may have missed and any suggestion is welcome.

doug-moen commented 3 years ago

@Tessil Instead of returning a std::pair, the iterators could instead return a custom pair type, eg a tsl::ordered_map::pair value. This new type would replicate the structure of a std::pair, but add some additional structure that would allow for mutating the 'second' field.

Tessil commented 3 years ago

A custom pair would solve the range-based for-loop problem though it would not really be possible to expose a .first member. It'd break backward compatibility while still not being a drop-in iterator replacement for std::unordered_map::iterator.

A lot of issues are open on this .value() problem (https://github.com/Tessil/hopscotch-map/issues/23, https://github.com/Tessil/hopscotch-map/issues/49, https://github.com/Tessil/robin-map/issues/4 and https://github.com/Tessil/ordered-map/issues/28) and I understand it can be quite annoying and unintuitive when migrating to a tsl hash map. Maybe the best solution would be to just return a std::pair<Key, Value>& and put a warning that modifying the .first element of the pair is undefined behaviour but it can bring hard to track problems.

mlogan commented 3 years ago

Interesting. I switched to tsl::sparse_map from absl::flat_hash_map, which was transparent except that I had to add:

       const_cast<T*>(&(it->second));

in one spot, due to this issue. So absl::flat_hash_map doesn't present this difficulty despite being a flat hash table. Have you happened to look at how they did it?

Tessil commented 3 years ago

Hi,

From my understanding absl::flat_hash_map uses an union of std::pair<K, V> and std::pair<const K, V> when absl::container_internal::memory_internal::IsLayoutCompatible<K, V> value is true and takes advantage of:

// Accessing one of the union fields while the other is active is safe as
// long as they are layout-compatible, which is guaranteed by the definition of
// kMutableKeys. For C++11, the relevant section of the standard is
// https://timsong-cpp.github.io/cppwp/n3337/class.mem#19 (9.2.19)

(source)

If the two pairs are not layout compatible it is forced to copy K when resizing the map.

Example:

#include <functional>
#include <iostream>

#include "absl/container/flat_hash_map.h"

class A {
 public:
  A(int i) : m_i(i) { std::cout << "constructor" << std::endl; }

  A(const A& other) {
    std::cout << "copy constructor" << std::endl;
    m_i = other.m_i;
  }

  A(A&& other) noexcept {
    std::cout << "move constructor" << std::endl;
    m_i = other.m_i;
  }

  A& operator=(const A& other) {
    std::cout << "copy assignement" << std::endl;
    m_i = other.m_i;
    return *this;
  }

  A& operator=(A&& other) noexcept {
    std::cout << "move assignement" << std::endl;
    m_i = other.m_i;
    return *this;
  }

  virtual void foo() {}

  bool operator==(const A& other) const { return m_i == other.m_i; }

 public:
  int m_i;
};

namespace std {
template <>
struct hash<A> {
  size_t operator()(const A& a) const { return hash<int>()(a.m_i); }
};
}  // namespace std

int main() {
  std::cout << absl::container_internal::memory_internal::IsLayoutCompatible<
                   A, int>::value
            << std::endl;

  absl::flat_hash_map<A, int> map = {{A(1), 1}, {A(2), 2}};
  std::cout << "reserve" << std::endl;
  map.reserve(32);
}

Output:

0
constructor
move constructor
constructor
move constructor
copy constructor
copy constructor
copy constructor
reserve
copy constructor
copy constructor

If you remove the virtual method the layouts will be compatible and you then have:

1
constructor
move constructor
constructor
move constructor
copy constructor
move constructor
copy constructor
reserve
move constructor
move constructor

If you replace absl::flat_hash_map by tsl::robin_map you'll have the latest output in both cases as it always stores a std::pair<K, V> (but at the price of a value() method).

The crux of the problem is that I would like to store a std::pair<K, V> to be able to move K but the interface should ideally return a std::pair<const K, V>& and I don't think it's possible to do it in a standard-compliant way for every possible K and V.