Closed tzachshabtay closed 3 years ago
@tzachshabtay thanks for the suggestion. Can you update your comment to be more like the format linked to from https://github.com/dotnet/corefx/blob/master/Documentation/project-docs/api-review-process.md ?
@danmosemsft, done.
@ianhays, @safern are area owners and can give feedback, or set to ready for review, as they think best.
This is an interesting proposal. It seems to me that the "best guess" approach would be fine for a pretty limited number of scenarios (e.g. construction in your OP) though.
This is one where I'd like to see if we get more interest before moving forward.
OK, community folks if you would use this proposal please thumbs up the top post. We can let this sit for a while...
Having played with multithreading since a wee child, perfect is the enemy of good when it comes to concurrency. Knowing anything "exactly" is a concurrency anti-pattern. I've wrapped the concurrent dictionary just to get approximate counts many times.
Imagine if a video game wanted to know "exactly" when an event occurred and in what order. I can image it. I lived through the horrible netcode of early FPS video games. Modern netcode, both client and server side event handling has to deal with the relativity of events. What is the definition of "when" or "order" when talking about different frames of reference. This is relativity for programming. There is no universal time. All that matters is the emergent property of correctness and handling conflicts in a reasonable way.
One just needs to be very clear about defining the exact semantics guarantees of ApproximateCount. An example is if ConcurrentDictionary uses buckets and tracks the size of each bucket, and maybe the ApproximateCount returns the sum of the "current" size of each bucket, it is possible to return a value larger than actually happen. So if you read bucket 0 and is size 2, and bucket 1 is empty at this moment, then you read bucket 1 and it's now size 2. The sum would be 4, yet maybe the max number of items was only 2. That shouldn't matter. All that really matters is the value is greater never negative.
And who really cares. Even if you could get the exact value of "2", the very instant you release a lock and try to do something with that "2", it's already possibly changed. Dealing with concurrency is like dealing with floating point values. Never do exactness.
As powerful as the idea of ApproximateCount is, I feel the crux is making sure the documentation is very well defined about what it actually means.
Even if you could get the exact value of "2", the very instant you release a lock and try to do something with that "2", it's already possibly changed.
Count
must already be treated as ApproximateCount
for this very reason.
It would be useful before thinking of a new API to look at the implementation of Count
and see if it can be done with less locking without affecting the performance of the other operations.
Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.
This process is part of the experimental issue cleanup initiative we are currently trialing in a limited number of areas. Please share any feedback you might have in the linked issue.
This issue will now be closed since it had been marked no recent activity
but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.
Proposed API
Rationale and Usage
Following the discussion from dotnet/runtime#15249, there seemed to be a consensus that an
ApproximateCount
property can fit plenty of use cases and can be implemented to be faster thanCount
(i.e avoid locking).For example, we might have a cache implementation that does some cleanup or flush-to-disk after reaching some arbitrarily defined X items in the dictionary:
In this use-case it doesn't really matter what the exact value of X is, and is perfectly tolerable if some race condition causes it to be inaccurate once in a while, but the action itself might be called a lot and in quick succession (maybe even in a tight loop), so the cost of the lock in
Count
is not negligible, and the developer might prefer to writeif (map.ApproximateCount> 10000) cleanup();
instead to avoid that cost.Another use-case with similar rationale behind it might be when we want to construct a list from a dictionary (or multiple dictionaries/collections), and need to come up with a reasonable estimate of the list's capacity before-hand to avoid extra copies. We could then write something like:
List<T> list = new List<T>(map1.ApproximateCount + map2.ApproximateCount);
, if we believe that the cost of usingCount
would be greater than the occasional non-exact list capacity.