TooBiased / growt

This is a header only library offering a variety of dynamically growing concurrent hash tables. That all work by dynamically migrating the current table once it gets too full.
Other
106 stars 13 forks source link

Let UpdateFunction decide to cancel update after seeing current data #4

Open leezu opened 6 years ago

leezu commented 6 years ago

Hey Tobias, first thanks a lot for your work on this library!

I noted you made extensive experiments in your paper regarding the performance under contention for a subset of keys (ie. Zipf distributed keys). In the particular case where one wants to count the occurrence of the keys in some dataset, contention can be reduced by using statistical counters (e.g. Dice et. al. "Scalable statistics counters."). Such counters provide a tradeoff between statistical guarantees about the accurateness of a count versus the number of writes necessary to the counter. In the concurrent case the number of writes is the number of CAS operations. For more contended counters, the probability of requiring a write during counter increment decreases.

The increment opreation of such counter is quite simple to implement:

  void inc(uint32_t random) {
    while (true) {
      double seenT = threshold.load();
      if (random > seenT) {
        return;
      }
      bool overflow = (seenT < (a + 1.0));
      double newT = seenT * probFactor;
      if (overflow) newT = (double)MaxInt;

      if (std::atomic_compare_exchange_weak(&threshold, &seenT, newT)) {
        return;
      }
    }
  }

Based on a random number, either an update of the stored value is performed or the stored value is left as it is. In case of CAS failure the operation is retried.

It would be very useful, to store such statistical counters as values in a concurrent hash map. However, in the current implementation, whenever update or insertOrUpdate is called, CAS will be executed. The UpdateFunction has no option to cancel the update.

I believe only few changes would be required to make this work though. 1) UpdateFunction must return bool to indicate if the update shall be cancelled or not 2) In TAtomic::execute (for SimpleElement) and atomicUpdate (for MarkableElement) check the return value of UpdateFunction and skip the CIS if false is returned (still indicating that the update was successful).

Would you suggest any other / better solution? Would you welcome a contribution to the library to implement this feature?

Unrelated: Your paper describes a strategy to support complex key and value types. Do you have any plans to support it for growt? I have a use-case for complex keys (strings) and if it is feasible without too many changes, would be happy to use / adapt growt for that.

TooBiased commented 6 years ago

Hi Leezu, thanks for the interest in my library and in my research. Currently I am on vacation, but I will get back to you once I have returned (the week after next).

Approximate Counters seem like a very valid storage option, and I think there could be other cases where updates might have to be cancelled, e.g., when computing the min of different values. I will try to make the feature available in some way.

The case with complex keys is somewhat more difficult. Currently there is no short term plan, to make this feature available. The main problem is the interface -- how to reduce copies/how to handle pointer ownership. If I have a good idea, this could happen sooner rather than later, but there are no immediate plans.

Cheers, Tobias'