TimurMahammadov / google-collections

Automatically exported from code.google.com/p/google-collections
Apache License 2.0
0 stars 0 forks source link

method to add to and return count on Multiset #49

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Nice to be able to retrieve the count with add.

So,

multiset.add(key);
int count multiset.count(key)

becomes

int count = multiset.addAndGet(key);

also,
int count = multiset.getAndAdd(key);
int count = multiset.addAndGet(key, occurrences);
int count = multiset.getAndAdd(key, occurrences);

Thanks!

Original issue reported on code.google.com by steven.marcus on 22 Feb 2008 at 11:20

GoogleCodeExporter commented 9 years ago
We have considered modifying add(E, int) to return an int instead of a boolean. 
That
seems better than adding a new method.

We can't change the method signature of add(E), since that method is in the
Collection superinterface. Since people can call add(E, 1) instead, it's not 
worth
adding a new method.

Original comment by jared.l....@gmail.com on 25 Feb 2008 at 10:27

GoogleCodeExporter commented 9 years ago
Jared's right; we're going to fix add(E, int) for you.

Original comment by kevin...@gmail.com on 27 May 2008 at 6:50

GoogleCodeExporter commented 9 years ago
add(E, int) now returns the *previous* count, from which you can easily 
determine the
new count.

Original comment by kevin...@gmail.com on 17 Mar 2009 at 5:20

GoogleCodeExporter commented 9 years ago
That doesn't seem like a good fix to me.  To use it, my code would have to say 
something along the lines of

count = multiset.add(key, n) + n;

which seems fraught with opportunities for error; much simpler and cleaner to 
use

multiset.add(key, n);
count = multiset.count(key);

To be clear - I understand that you can still do this; I just think this 
violates the 
Principle of Least Surprise.

What is the harm in returning the new count?

Original comment by carl.man...@gmail.com on 17 Mar 2009 at 5:34

GoogleCodeExporter commented 9 years ago
The key case here is remove().  If I call remove(e, 5) and get a result of 0, I 
don't
know whether 5, 4, 3, 2, 1, or 0 were removed.  It's important that this 
information
be available when using ConcurrentMultiset for reference-counting.

Once we have remove() and setCount() returning the old count, it makes sense for
add() to fall in line.  I agree that this is not necessarily what you'd expect 
-- us,
either, which is why this has been up in the air for a while -- but it's the 
behavior
that makes the most things possible, even at the cost of making some other 
things
more difficult.

Original comment by cpov...@google.com on 17 Mar 2009 at 5:50

GoogleCodeExporter commented 9 years ago
Agreed; if we'd made the opposite choice, we'd have setCount() and remove() 
working
one way and add() working the other.  So the venerable POLS can work in either
direction on this one.

(For obvious reasons Map.put() and ConcurrentMap.replace() also return the 
previous,
not new, value; so this also keeps with that spirit (even though in this case 
there
is a choice and in that case there isn't).)

Final point: while you use "m.add(k, n) + n", to your frustration, if we did it 
your
way they would be other people needing to use "m.add(k, n) - n"; someone's 
always
screwed!

Original comment by kevin...@gmail.com on 17 Mar 2009 at 6:07

GoogleCodeExporter commented 9 years ago
Concurrency is a bear, and I don't use it enough to properly appreciate the 
restrictions it imposes, but I can see how you would want that information.

When I add an item to a set, I want to know, atomically, whether it was added.  
So 
add() returns a boolean.  OK, I get that.  When I remove, the same is true.  
OK, now 
take that into the world of multisets.

When I add several items to a multiset, I want to know how many of them were 
added.  
So, why not return the number of them that were added?  Likewise with remove(). 
 I 
mean, HashSet.add() returns information about the modification, not about the 
resultant (nor just-prior-to-resultant) set.

To me, the natural analog of HashSet.add() returning a boolean would be 
MultiSet.add() returning an integer that is the number of items added.  Same 
for 
remove().

Do you feel that MultiSet is sufficiently different from HashSet that it should 
have 
a different semantics?  Or do you think that HashSet's was wrong in the first 
place?

I can see how we might want, atomically, any of: old set information, new set 
information, change information.  Maybe it would be better to return some kind 
of 
struct that included all three fields?

You are clearly making some tough design choices; this one just doesn't feel 
right to 
me.  But I don't much do concurrency, and I can accept that things go on in 
that 
realm that are counterintuitive outside it.  Thanks.

Original comment by carl.man...@gmail.com on 17 Mar 2009 at 6:08

GoogleCodeExporter commented 9 years ago
The number added is always exactly the number you told it to, so it's useless
information.

"Do you feel that MultiSet is sufficiently different from HashSet that it 
should have 
a different semantics?"

Yes, absolutely.  What they have in common is what's specified by
java.util.Collection, nothing else.

"Maybe it would be better to return some kind of struct that included all three 
fields?"

This is overkill.  We believe we've provided enough information that the user 
can
determine whatever they need to know from it.

Original comment by kevin...@gmail.com on 17 Mar 2009 at 6:28

GoogleCodeExporter commented 9 years ago
Re-reading my message, I realize that I didn't give the right example.  As you 
point
out, it would be enough for remove() to return the number removed.

What I'm really thinking of (and the only place I've seen in Google code that 
uses
the result of Multiset.remove(Object, int)) is code that need to know whether 
it has
just removed the final occurrence of an element:

if (remove(e, 1) == 1) {
  // removed the last one
}

This (I think) is the thing you just can't do concurrently if remove() returns 
how
many items it removed.  Now, that still leaves the possibility of returning the 
new
count.  But *that* makes the first example I gave impossible (though I don't 
have a
use case for that example offhand).  It also makes it impossible to check for 
errors
by confirming that no one calls remove(e, 1) when zero occurrences remain.

I think Kevin's point is a good one: Even though it's a Collection, Multiset is 
in
some ways more like a Map or ConcurrentMap, whose mutation operations return 
the old
state, not the new state.

Original comment by cpov...@google.com on 17 Mar 2009 at 6:31

GoogleCodeExporter commented 9 years ago
I wonder how useful it really was to know that the last occurrence of an 
element was 
removed, in a concurrent setting. By the time you realize that this was the 
case, 
there might already be millions of that elements into the multiset again. 
Strange.

But if there was something external that made such a race condition impossible 
or 
harmless, then I don't see why "!contains(e)" could not be used as well.

Original comment by jim.andreou on 17 Mar 2009 at 6:56

GoogleCodeExporter commented 9 years ago
Clearly we would have a lower-latency discussion if I just posted the code in 
question :)

In the particular reference-counting scenario I'm dealing with, there are three
operations:
1) Create a new object with a reference count of 1
2) Duplicate a reference to an object you already have a reference to
3) Remove a reference to an object you already have a reference to

(2) is the only way to increase the reference count for an existing object, so 
it's
possible to go from 1 to 2 or 5 to 6 but not from 0 to 1, since you need to 
have a
reference to make a copy.

But when it comes to removal, we can't just check that zero items remain after 
the
call to remove() because the two threads with the last two references might
interleave as follows (it sounds like you understand this, but for 
completeness):

(2 items remain)
A calls remove()
B calls remove()
A calls count(): 0
B calls count(): 0

A and B would both try to perform a cleanup task that should be performed 
exactly once.

This may be an atypical use of a Multiset, but one of our goals was not to make 
the
mistake of having a Multiset interface and a ConcurrentMultiset interface a la 
Map
and ConcurrentMap, and providing these surprising return values allows us to
accomplish that goal.

For what it's worth, few people inside Google use the return values from add(E, 
int)
and remove(Object, int).  But I also believe that Multiset is underutilized, 
perhaps
in part because the return values used to be less useful.

Original comment by cpov...@google.com on 18 Mar 2009 at 2:42

GoogleCodeExporter commented 9 years ago
I see. Nice to know - I was sure it had to be something interesting! So while 
in a 
general such a value is useless if it describes a state which can be already 
invalid, 
one can externally impose the additional guarantee that a particular state is 
final, 
and magically the value is meaningful again in that case.

Such final states remind me of the roach motel principle: you enter zero, you 
stay 
zero :)

Original comment by jim.andreou on 18 Mar 2009 at 3:48