Open MarkCiliaVincenti opened 1 year ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Can't you determine whether the value was added by comparing it with the returned value?
Tagging subscribers to this area: @dotnet/area-system-collections See info in area-owners.md if you want to be subscribed.
Author: | MarkCiliaVincenti |
---|---|
Assignees: | - |
Labels: | `api-suggestion`, `area-System.Collections`, `untriaged` |
Milestone: | - |
Can't you determine whether the value was added by comparing it with the returned value?
First of all no, because the value you get will often be equal to the one you're trying to add.
Secondly, even if you could, it's suboptimal having to compare them. Then you might as well use the very first example and only do the GetHashCode() once, when you fail to get, rather than twice in every scenario (to compare the one you tried to add with the one you got/added):
while (!TryGetValue(key, out releaser) || !releaser.TryIncrement())
{
if (TryAdd(key, releaserToAdd))
{
return releaserToAdd;
}
}
I'm trying to optimize the above. It's wasteful having to do a GetHashCode() again for the inner TryAdd given that the TryAddInternal method is private.
Also, related to this, because the idea at the end of the day is to avoid unnecessary multiple calls to GetHashCode(), what is the reasoning behind TryGetValueInternal and TryAddInternal being private? In very high concurrency you may need to alternate between gets and adds multiple times until you finally manage to get or add, if you're locking and updating after the get, so I don't understand why the methods which accept a hashcode are made inaccessible.
First of all no, because the value you get will often be equal to the one you're trying to add.
That sounds weird to me. Can you explain more about your use case? If you have some other way of getting the value associated with the key, why do you need the dictionary in the first place?
Secondly, even if you could, it's suboptimal having to compare them.
It shouldn't be. If the values are reference types, then ReferenceEquals
is very cheap. If the values are small value types, then comparing for equality should also be cheap. That leaves large value types, which should be rare.
First of all no, because the value you get will often be equal to the one you're trying to add.
That sounds weird to me. Can you explain more about your use case? If you have some other way of getting the value associated with the key, why do you need the dictionary in the first place?
Secondly, even if you could, it's suboptimal having to compare them.
It shouldn't be. If the values are reference types, then
ReferenceEquals
is very cheap. If the values are small value types, then comparing for equality should also be cheap. That leaves large value types, which should be rare.
My use case is for https://github.com/MarkCiliaVincenti/AsyncKeyedLock/blob/6.0.2/AsyncKeyedLock/AsyncKeyedLockDictionary.cs
I'm adding an instance of a class which has an int property. When added, that int needs to be 1. Every time I read it, I need to increase it by 1. In a way it is a counter. Now if it's just been added (so it's in the dictionary with value of 1) then there's no knowing whether it was added or whether it was read from the dictionary. Both would have the value of 1 but when it's read from the dictionary you'd want to change it to a 2.
As for the micro-optimization part... TryGetValueInternal, TryAddInternal and TryUpdateInternal exist simply so methods like GetOrAdd don't go through GetHashCode() twice.
Why that optimization is not provided to whoever cannot use GetOrAdd I don't exactly understand why.
Ideally above line 83 in https://github.com/MarkCiliaVincenti/AsyncKeyedLock/blob/6.0.2/AsyncKeyedLock/AsyncKeyedLockDictionary.cs I would GetHashCode() and pass them on to all subsequent TryGetValue and TryAdd calls. But because the methods accepting a hashcode are private, I can't, so I will repeatedly end up compiling hashcodes with no way of telling how long that would take especially in the case of complex objects.
As I mentioned, since the value to add is a reference type that is not the same as the value that's already in the dictionary, you can use ReferenceEquals
to see whether that value was added. No new API needed.
As I mentioned, since the value to add is a reference type that is not the same as the value that's already in the dictionary, you can use
ReferenceEquals
to see whether that value was added. No new API needed.
With what I am trying to achieve, I don't think it's possible for GetHashCode() to get computed just once, except for in the happy path.
As I mentioned, since the value to add is a reference type that is not the same as the value that's already in the dictionary, you can use
ReferenceEquals
to see whether that value was added. No new API needed.With what I am trying to achieve, I don't think it's possible for GetHashCode() to get computed just once, except for in the happy path.
Why would you need a hash code to determine reference equality?
As I mentioned, since the value to add is a reference type that is not the same as the value that's already in the dictionary, you can use
ReferenceEquals
to see whether that value was added. No new API needed.With what I am trying to achieve, I don't think it's possible for GetHashCode() to get computed just once, except for in the happy path.
Why would you need a hash code to determine reference equality?
The hashcode gets computed every time you call TryGetValue, TryUpdate, TryAdd etc.
Sometimes, including in my code, you need to call TryGetValue, TryAdd etc multiple times, sometimes in a while(true). These methods call GetHashCode() every time. The methods that accept an already-computed hashcode are marked as not exposed publicly.
Here's similar code that's also calling TryGetValue, TryUpdate and TryAdd within a while loop, perhaps you are familiar with the SixLabors ImageSharp libraries: https://github.com/SixLabors/ImageSharp.Web/blob/main/src/ImageSharp.Web/Synchronization/RefCountedConcurrentDictionary.cs
Code like this would benefit from exposing the methods that accept the hashcode because then you are able to call GetHashCode once. Under heavy load, with my library or with ImageSharp.Web or with other libraries like https://github.com/amoerie/keyed-semaphores/blob/main/KeyedSemaphores/KeyedSemaphoresCollection.cs, GetHashCode could be called several times, needlessly.
I think what @svick suggested is, assuming your value is a reference type, you could try doing the following:
TValue result = dict.GetOrAdd(key, value);
bool added = ReferenceEquals(value, result);
With a little bit of work, it should be possible to extend that to the overload that accepts a value factory.
I think what @svick suggested is, assuming your value is a reference type, you could try doing the following:
TValue result = dict.GetOrAdd(key, value); bool added = ReferenceEquals(value, result);
With a little bit of work, it should be possible to extend that to the overload that accepts a value factory.
I am already using GetOrAdd, but I still need it in a loop, and I still need to do the TryGetValue and TryAdd before.
What I'm trying to do here, using a ConcurrentDictionary and updating the reference counter in a thread safe manner and removing it when the counter reaches 0, is simply impossible to do with the current API without GetHashCode() getting called multiple times.
So yes, there clearly is a need for this API, because it would allow me, and others in my situation, to write more optimal code.
@MarkCiliaVincenti the motivation for the proposed GetOrAdd
overload with the out bool added
parameter is to use it like this:
releaser = GetOrAdd(key, releaserToAdd, out bool added);
The counter-argument is that the added
can ne infered using the existing APIs, like this:
releaser = GetOrAdd(key, releaserToAdd);
bool added = ReferenceEquals(releaser, releaserToAdd);
So it seems that the proposal is not opening new possibilities for using the ConcurrentDictionary<TKey,TValue>
collection. Specifically it's not reducing the number of times that the GetHashCode
has to be called in any scenario. Maybe you have in mind some other proposal, that would involve exposing publicly methods that are currently private or internal?
@theodorzoulias yes I am now proposing making TryGetValueInternal, TryAddInternal, TryUpdateInternal public.
Although the original proposal also still holds. Using ReferenceEquals won't work if the key is a struct, for example.
@MarkCiliaVincenti you could try submitting a proposal about exposing the TryGetValueInternal
, TryAddInternal
and TryUpdateInternal
methods, but I wouldn't be very optimistic about seeing it approved. You would have to present some really good arguments to support it, including usage scenarios and benchmarks, in order to overcome the instinctively negative feelings that such a proposal generates.
You might want to take a look at an informal proposal by @sab39 about a flexible Modify
method that might be sufficient for your scenarios. That proposal (if submitted properly) might have better chances to be approved if it gets enough traction.
in order to overcome the instinctively negative feelings that such a proposal generates.
Why is this so? I'm trying to understand.
@MarkCiliaVincenti because the suffix Internal
in their names signifies that they are intended to be hidden from the public view. :-) Btw there is a TryRemoveInternal
method too.
@MarkCiliaVincenti because the suffix
Internal
in their names signifies that they are intended to be hidden from the public view. :-)
Yes of course, but what I don't understand is the reasoning why they were not made public in the first place. Was it to not add clutter? The people who made these methods did so because they realized that it is more optimal to pass on the already-computed hash code rather than computing it again when making methods such as GetOrAdd
. If it's useful for GetOrAdd
then it should also be useful for whoever needs to do multiple dictionary API calls in their code.
@MarkCiliaVincenti I think that adding public APIs that accept key+hashcode would be an admission that the IEqualityComparer<T>
abstraction has failed. That's a hard pill to swallow. I do understand that these methods could be useful though.
@MarkCiliaVincenti I think that adding public APIs that accept key+hashcode would be an admission that the
IEqualityComparer<T>
abstraction has failed. That's a hard pill to swallow. I do understand that these methods could be useful though.
Why would it have failed though? Because we have such interfaces, it's possible to override the GetHashCode methods, and then there's no limit to how long these methods can be expected to take to return a value. Theoretically, it could take minutes, or hours, although extremely unlikely. More realistically, though, it could take enough time that makes my proposal of exposing these methods feasible.
@MarkCiliaVincenti you could try submitting a formal proposal, present your best arguments, explain why all the alternatives are worse, and see how it goes. Personally I have conflicting feelings about this, so I am not promising that I'll upvote it.
Background and motivation
Sometimes you need to run code only if there's a Get or an Add to the ConcurrentDictionary.
Consider this example:
The problem with this code is that you can't use
GetOrAdd
because in this example one only wants to do the increment on the get, not the add. Therefore, one has to use bothTryGetValue
andTryAdd
, which is also suboptimal because thekey.GetHashCode()
is run again for theTryAdd
(which wouldn't be the case if one usedGetOrAdd
).With this change in place, the above code could be rewritten as such and work more efficiently because it wouldn't call
GetHashCode()
every time it tries to add:I have already tried to make a pull request at https://github.com/dotnet/runtime/pull/79931 but this has been closed as it didn't follow the API review process, but I am linking to it so you can easily see the small code changes.
API Proposal
API Usage
Alternative Designs
No response
Risks
No response