Closed milianw closed 2 years ago
Hi Milian! Yeah I think this is a good point. I had imagined that somebody who wanted to construct values using a custom function could create a custom type, so something like the following:
template <typename T>
struct ObjWrapper {
template <typename F>
ObjWrapper(F constructor) : obj(constructor()) {}
T obj;
};
int main() {
std::shared_mutex mutex;
std::vector<std::string> indexToString;
libcuckoo::cuckoohash_map<std::string, ObjWrapper<size_t>> stringToIndex;
auto stringForIndex = [&](size_t index) {
auto lock = std::shared_lock(mutex);
return indexToString[index];
};
auto indexForString = [&](const std::string& string) {
size_t ret_index;
stringToIndex.upsert(
string,
[&ret_index](const ObjWrapper<size_t>& wrapper) {
ret_index = wrapper.obj;
},
[&] {
auto lock = std::unique_lock(mutex);
ret_index = indexToString.size();
indexToString.push_back(string);
return ret_index;
});
return ret_index;
};
...
}
The indexForString
function is a bit more cumbersome than what you have written, but I think it should be the same efficiency?
Yes, you are right - using a different type would work too - but I have to ask: Why not simply adapt the API of cuckoohash_map to make this simple and explicit? I mean there are already some _fn types, this one is just missing?
Ping? Anything i can do to move this forward? Or are you adamant about moving this to the type side instead?
Sorry for the late reply @milianw. I am not opposed to modifying the API, but I do feel that adding an entirely new function like this may become confusing. Note that the functions passed to the _fn
methods are for mutating the value in place, once it is already in the table (essentially letting the user do arbitrary operations on the mapped_type&
under the lock), so repurposing that naming convention to mean "constructor" function may be confusing.
Furthermore, it would be nice to stick to an API that is analogous to the STL, which doesn't implement any special overrides for constructing-from-function. However, it is definitely easier to mutate a value after default construction in STL, with something like this:
auto [it, is_new] = map.try_emplace(key /* value will be default-constructed */);
if (is_new) {
// Set it->second to whatever is desired.
}
So maybe we can emulate this functionality by augmenting the _fn
in uprase_fn
, so that you can pass in a functor which implements operator()(mapped_type&, bool /* is_new */)
. If you pass in such a functor, uprase_fn
will invoke the functor both when the key was already in the map and if it was newly-inserted, and the user can use the is_new
boolean to distinguish between cases. This differs from the current behavior, where the functor is only invoked if the key is already in the map.
Does this sound like a reasonable API?
Hey there.
Yes, your proposal could work, I'm not overly attached to my proposal. Just as a suggestion though, I would not use a plain bool argument but rather some kind of enum class
to make this API more explicit and self explanatory what the bool
state means.
Cheers
Hi @milianw, just created https://github.com/efficient/libcuckoo/pull/141, which adds the functionality we discussed. Let me know if this looks alright, and I'll merge it in!
superseded by https://github.com/efficient/libcuckoo/pull/141
This new API makes it trivial to leverage libcuckoo for string interning or similar areas, where constructing a value is potentially expensive or requires further book keeping:
Note how we only need to acquire the expensive unique_lock when we are inserting a new string key value. In the hopefully common case where the string was already inserted previously, we can rely on the shared libcuckoo locking to reduce lock contention.
I do not see any way to leverage the existing libcuckoo API in a similar manner - but maybe I'm missing something? Note how I cannot use something like the following, as it would contain race conditions for the size call and subsequent insertion into the indexToString vector: