Tessil / ordered-map

C++ hash map and hash set which preserve the order of insertion
MIT License
512 stars 66 forks source link

Add an erase_if function #37

Closed rkaminsk closed 1 year ago

rkaminsk commented 2 years ago

This PR adds an erase_if function to the map that removes all elements that satisfy a predicate. The function preserves the insertion order shifting elements if necessary. It runs in O(n). I think that with the current interface it is only possible to implement something running in O(n^2).

C++20 adds a free-standing erase_if function for maps. The function here is defined in a similar way.

Tessil commented 2 years ago

Thanks, that would be a useful feature.

I wonder though if there isn't a problem in the code where the last bucket is never cleared. If I do:

tsl::ordered_map<int, int> map{{1, 1}, {0, 2}};
erase_if(map, [](value_type &x) { return x.second == 2; });

The value {0, 2} is removed from m_values but the corresponding bucket_entry is never cleared or am I missing something?

rkaminsk commented 2 years ago

I wonder though if there isn't a problem in the code where the last bucket is never cleared.

You are right. I'll provide a fix.

Tessil commented 2 years ago

Thanks for the update.

It seems also that no backward shift is done on erasure (see the already existing erase functions in ordered-map or https://codecapsule.com/2013/11/17/robin-hood-hashing-backward-shift-deletion/). After erasing the elements, the remaining ones could not be at their ideal location anymore and the find could fail.

Example:

  using value_type = tsl::ordered_map<int, int>::value_type;
  tsl::ordered_map<int, int> map{{0, 2}, {16, 2}, {24, 2}, {5, 5},
                                 {6, 2}, {7, 7},  {8, 8},  {9, 9}};
  erase_if(map, [](value_type& x) { return x.second == 2; });

  BOOST_CHECK(map.at(5) == 5);
  BOOST_CHECK(map.at(7) == 7);
  BOOST_CHECK(map.at(8) == 8);
  BOOST_CHECK(map.at(9) == 9);
rkaminsk commented 2 years ago

It seems also that no backward shift is done on erasure.

Maybe I should have spend some more time looking at the erase function. Just skimmed and though the backward shifting is about the vector. :sloth:

I'll fix this one too and also add a better unit test. I looks like it's just about calling backward_shift as done in erase_value_from_bucket.

rkaminsk commented 2 years ago

The problem should be fixed. I also made sure that the values in the container are passed by const reference to the predicate.

rkaminsk commented 1 year ago

It's been a while. Is there anything else that should be done?

Tessil commented 1 year ago

Note also we may have to think about the strong exception guarantee of this function. What happen if an exception is thrown while we already have cleared some buckets? Would the map be in a undefined state? I have to think a bit about it as it may be quite complicate to achieve the strong exception guarantee.

rkaminsk commented 1 year ago

As long as the value_type is noexcept movable, the function should be nothrow. Otherwise, I don't think that any reasonable guarantees can be given. (Maybe that deleting at the end does not throw either.)

EDIT: The exception specification for the erase function of the vector container can probably serve as guidance:

rkaminsk commented 1 year ago

I have to think a bit about it as it may be quite complicate to achieve the strong exception guarantee.

If you really want to have the strong exception guarantee, one would have to copy the values into a new container if neither the move nor the assignment are nothrow. Since the container is a mixture of an map and a vector, one could avoid this complexity because the vector also does not provide a strong exception guarantee. Maps probably do because neither moves nor assignments are involved.

Tessil commented 1 year ago

The ordered-hash only provides strong-exception guarantee if T is a noexcept movable-type or copyable one:

Concerning the strong exception guarantee, it holds only if ValueContainer::emplace_back has the strong exception guarantee (which is true for std::vector and std::deque as long as the type T is not a move-only type with a move constructor that may throw an exception, see details).

My main worry is that Hash, KeyEqual or the Predicate could throw for some unlikely reason during the erasure which could corrupt the map. I don't think there's an easy and clean way to overcome the problem though..

rkaminsk commented 1 year ago

My main worry is that Hash, KeyEqual or the Predicate could throw for some unlikely reason during the erasure which could corrupt the map. I don't think there's an easy and clean way to overcome the problem though..

A throwing predicate could be handled by stopping to delete if an exception is thrown. Throwing Hash or KeyEqual functions are more difficult. One would have to store the results first before actually deleting. A safe version (pseudo-code) could be:

void erase_if(ordered_map &map, auto pred) {
  auto cpy = ordered_map();
  for (auto &&x : map) {
    if (!pred(x)) {
      cpy.insert(x);
    }
  }
  cpy.swap(map);
}

Not really what I would expect from such a function. Maybe simply document that the function is not exception safe?

rkaminsk commented 1 year ago

Hello, it has been a while. What about adding the following warning to the documentation.

This function is not exception safe. In case of an exception, the container must not be used any more. It can still be safely destroyed or cleared, though. Exceptions can be avoided by providing non-throwing move assignments for keys and values, hashers and equality comparisons, and comparison predicates.

The function will still be very useful in practice because the mentioned functions can often be implemented with nothrow guarantees. And even if not, implementing larger systems with full rollback guarantees is very challenging. Developers have to pay a lot of attention. This includes reading the documentation. :wink:

Tessil commented 1 year ago

Sorry for the delay.

I don't see an easy way to make it exception safe and I understand it can be a quite useful addition. Adding a warning in the documentation would be good enough and eventually in the README near the "strong exception guarantee" mention. I'll merge it after.

Thanks.

rkaminsk commented 1 year ago

Perfect! I should be able to do the update this this week.

rkaminsk commented 1 year ago

I added some lines about exception safety. Please have a look.

EDIT: the test failure is just because of a failed download. If you rerun it, it would probably pass.

Tessil commented 1 year ago

Thanks, I merged it.

rkaminsk commented 1 year ago

Thanks!