chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.79k stars 420 forks source link

Implement a new remove method for set that returns the item removed #18649

Open bmcdonald3 opened 2 years ago

bmcdonald3 commented 2 years ago

Currently, set.remove() returns a boolean value that represents whether or not the element was removed. It would be desirable to have another method that would return the removed element for cases where a user has overloaded the == operator on their own type.

Proposal 1: remove() and discard() -- sort of do what Python does

Vote for this proposal with a 👍

Proposal 2: keep set.remove() the same and add a more verbose method that returns

Vote for this proposal with a ❤️

Proposal 3: keep set.remove() the same and only implement this new element returning remove if a user requests it

Vote for this proposal with a 🚀

bradcray commented 2 years ago

Noting that this is something of an offshoot of https://github.com/chapel-lang/chapel/issues/15819 (which we could either close in favor of this one, or keep open until both are resolved).

bmcdonald3 commented 2 years ago

As I am getting around to the implementation of this issue, I am realizing that the team is too split to make a call here. If you have a minute in the next week, please put forth your best argument for your position and feel free to shift your votes if you feel compelled by a particular argument (that may be the only way to converge on a solution...).

Those currently in favor of adding a set.discard() method: @mppf @daviditen @aconsroe-hpe @dlongnecke-cray

Those currently in favor of holding off on this change until a user requests it: @lydia-duncan @arezaii @stonea @bmcdonald3

Sorry to ping you all again about this, but trying to make some progress on these follow up issues!

lydia-duncan commented 2 years ago

"remove" to me doesn't have to imply "remove and return". I think it's fine for remove to mean just pull the element from the type and drop it on the floor and we already have a different name for when other data structures remove and return - it seems like unnecessary effort to flip the meaning of the name in all cases and adding a new way to accomplish "remove and return" for this data structure is not a 2.0 breaking change so I think it's fine to wait on it.

bradcray commented 2 years ago

"remove" to me doesn't have to imply "remove and return". it seems like unnecessary effort

I agree with these, and tend to like ❤️ and 🚀 for these reasons. But I think "making things less surprising for Python/Swift users" is a reasonable counterargument for 👍 (which is why I abstained... my personal happiness with the status quo and desire to avoid busy-work is in conflict with not wanting to have programmers complain about this down the line).

dlongnecke-cray commented 2 years ago

I'm not really happy with getAndRemove(), as personally I view that name as "we really wanted to call this thing remove(), but we can't because we already used that name".

And I don't think we can wait for a user request (option 3 ❤️) because it precludes the possibility of any name but getAndRemove() (unless another, even better name strikes us at that time). Because to rename remove() to discard() necessitates breaking some code, and we'll want to avoid breaking changes.

In general I appreciate logic that @mppf told me once 🧙 about how it's reasonable for dangerous/rare/exotic callables to have long names, because it lets you describe them precisely and calls them out at the same time. However I think the opposite should be true for common methods on common types that we can expect to be called all the time. We should endeavor to choose a succinct name that doesn't leave the user (me, in this case 😄) chewing it every time they type it.

"remove" to me doesn't have to imply "remove and return"

I agree that it does not, but it leaves that possibility open (unlike discard). Which is why of the two names I think it is more suited for the purpose of "remove and return".

bmcdonald3 commented 2 years ago

@dlongnecke-cray

it's reasonable for dangerous/rare/exotic callables to have long names

Thinking of Brad's example along the lines of: "why would I want to call set.remove(10) and have 10 returned to me?", this to me seems like the common case, where as the case outlined in the original discussion seemed rare/exotic to me (I'm not going to embarrass myself by trying to summarize that topic here...), in the sense that it requires a deep understanding of how classes in Chapel work even to understand why someone would want the functionality.

I agree that calling set.remove() is common, but I don't know that I necessarily agree that the case where you would like to have the value returned to you is common in practice, and I could be wrong, but this is why I am not particularly convinced by that argument and would still lean personally to holding off on the specific return case for now.

dlongnecke-cray commented 2 years ago

I guess my question is - let's say we do hold off on this and then the question comes up again from a user after 2.0 - what would would our options be? If the only thing we can do is add getAndRemove or removeAndReturn because those are the only names that makes sense and aren't breaking, I'd be pretty bummed.

I understand the whole "let's wait and see instead of make more work for ourselves" argument, but I think that pre 2.0 is the best time to make these changes while we still have the freedom to do so. In which case I'd still advocate for renaming remove to discard even if we opt not to add the "remove and return" variant, just so we can keep our options open in the future.

bradcray commented 2 years ago

@dlongnecke-cray : I feel like your responses have been focusing on the most clunky of alternatives for an alternative name for the value-returning path. Can you comment on take() (which seems very intuitive and short) and fetchRemove() (which has some precedent in fetchAdd() for atomics).

dlongnecke-cray commented 2 years ago

Ah, apologies. You're right that I did tunnel vision in on the clunky names.

I'd be OK with take() but still find myself stumbling over it when comparing it to remove(). It's also hard for me to visualize using it in a e.g. var elem = myMap.take(5); because I don't think I've ever seen that sort of verbiage before. I admit I could probably get used to it , though. EDIT: I think I'm also falling victim to something Michael described where I keep reading myMap.take(5) as giving 5 to myMap.

I'm not the biggest fan of fetchRemove(), because I think it suffers from the same problem as getAndRemove - it feels like a way to double up on the word remove since we already used it, and it's also two words.

bradcray commented 2 years ago

Thanks David. One other question: Have you had a real-world example of using a remove() call in which you've wanted to know the value of the thing removed from the set (i.e., you're not calling remove(10) and you don't need the returned value because you know it's 10)? And if so, could you share it for motivation?

dlongnecke-cray commented 2 years ago

No, I can't think of a specific real-world example off-hand. I think my original motivation for wanting to support both methods was to make sure we had some measure of support for both identity and logical equality. Obviously the two are trivially the same when we're talking about an int, but things seem to get a bit murkier for me when we start comparing e.g. class instances.

Which form of comparison do we plan to use in that case? If we're planning on using == for classes, then isn't that just comparing pointer values? Looking to the future: if we add a equals() or === for classes and use that instead, do we provide a mechanism to preserve an instance to inspect its identity, just in case?

A user might want to maintain some sort of tree like structure where logical equality might be preferred over == for the purposes of caching. One silly use-case might be using a set to construct a DAG, where we want to uniquify nodes by comparing them logically but then reuse the existing node. I don't we don't want to call remove() in that case, so probably not much help, but that's just where my head goes.

bradcray commented 2 years ago

No, I can't think of a specific real-world example off-hand.

That's what made me ask. I feel like we're engineering for a case that none of us have ever actually coded to or felt a need to code to (that I recall hearing about), whereas I have coded to "I want to know whether there was actually a matching value being removed", so find if mySet.remove(10) attractive for that pattern (whereas, with the other definition, I'd need to change the call to either if mySet.remove(10) == 10 or use the other tbd method).

OTOH, since other languages provide it in this form, if that's what users are accustomed to, maybe we should follow suit (which, again, is why I'm on the fence).

Although wait, Python doesn't after all, does it? So remind me: what makes the :+1: option Python-like again?

Swift does return the element, though. Rust returns a bool. Java returns a bool. Let us not speak of C++. It seems like we are not in bad company if we simply continue returning a bool.

If we're planning on using == for classes, then isn't that just comparing pointer values?

By default, yes, but a user can overload it to do a logical comparison of their own definition. And that is something of an interesting example: I define == on two classes, but I want to have the specific object removed from the set returned to me rather than the exemplar I passed in for some reason (just wish I knew what the reason was). That said, I still don't have a case where I've wanted to do this. And it seems more common to me that for cases like that, one would want to pass in a comparator like if myEmployeeSet.remove("Brad", comparators=firstNamesMatch) rather than relying on a single type-wide definition of == for "match and return the thing that matched" types of cases.

OK, so maybe I'm not as much on the fence as I thought. While I think highly of Swift, it increasingly feels like the outlier here.

mppf commented 2 years ago

I want to have the specific object removed from the set returned to me rather than the exemplar I passed in for some reason (just wish I knew what the reason was).

If you are creating some sort of cache of classes, by value, then you will want == to not be comparing pointers, but rather the contents. When you want to look up in the cache, you have to create a value to search on, so you create a class instance that won't have the same pointer value as the one in the cache (since to get that, you'd have to look up in the cache). This kind of thing comes up in the compiler/next code quite a bit. I don't think the specific pattern of wanting to remove a particular cache value has come up there, but I don't see any strong reason why it couldn't. However, I do agree that this issue is a bit hypothetical today.

bradcray commented 2 years ago

And it seems more common to me that for cases like that, one would want to pass in a comparator like if myEmployeeSet.remove("Brad", comparators=firstNamesMatch) rather than relying on a single type-wide definition of == for "match and return the thing that matched" types of cases.

Thinking about this overnight, I've realized that this sort of interface doesn't make sense for the default remove because a hash-table-based set would need to hash on a value to look it up, so passing in a comparator wouldn't support that, and would essentially require doing a test on all elements (and potentially removing multiple of them).

bradcray commented 2 years ago

Here's where it seems like this stands to me:

For that reason, it seems to me that our status quo is acceptable and that we should look into adding the returning remove as an additional method, whether now or later. Again, the main reason I was on the fence before was that the OP said that :+1: was Python-like, making me think Python returns the value, yet it doesn't.

I think the main counterargument we've heard against this proposal (on this issue at least) is that @dlongnecke-cray doesn't really like the names that have been proposed for the returning version of remove. @mppf, @dlongnecke-cray , @aconsroe-hpe , @daviditen : Any other arguments in favor of :+1: or flaws in my reasoning here that you see?

mppf commented 2 years ago

Any other arguments in favor of 👍 or flaws in my reasoning here that you see?

I was wondering to myself if leaving remove with the status quo would make the APIs for various collections consistent.

So it seems that the status quo is already relatively consistent for remove return a bool/count. We could additionally choose to use pop in list/map/set for the remove that returns an element, instead of fetchRemove/take. But anyway this ends up being more an argument for 🚀 than anything else.

bradcray commented 2 years ago

We could additionally choose to use pop in list/map/set for the remove that returns an element, instead of fetchRemove/take.

We have some issues about how "pop" may not be the best name for cases that already use it (https://github.com/chapel-lang/chapel/issues/16194 and to an extent https://github.com/chapel-lang/chapel/issues/18652). The crux of the issue is that (to me at least, though I think others have said similarly), "pop" implies "this is a stack", so applying it to types that aren't obviously stacks seems a bit odd and/or potentially confusing (e.g., which end of the list is the top of the stack? does our set maintain some sort of ordering?). I think in your proposal, we'd also need for pop() to take an argument, which also strikes me as odd for a "pop" routine. But maybe there's a precedent here that I'm just not familiar with? Or maybe I just need to get over my pop::stack issue and think of pop as a general "grab something" routine?

mppf commented 2 years ago

Yeah it's a side issue here, but I definitely think of pop as a "grab something" routine, FWIW.

aconsroe-hpe commented 2 years ago

I find myself leaning proposal 3 🚀 now after re-reading the discussion and thinking about this again.

I went back to look through some code I thought I had written that used the "set remove with == overloaded", but it was actually lookups in a map that essentially did def get(k): return mapGet(canonical(k)) and def set(k, v): return mapSet(canonical(k), v). So the custom equality was given to the custom map instance and not on the type used.

dlongnecke-cray commented 2 years ago

I don't think we can use pop because (in Python at least) it just consumes a random element from the set without any ordering guarantees. So we'd confuse people who are used to different semantics.

I tend to see pop as -> "consume from a sensible end if ordered, consume something if not".

I think I am tentatively ok with :rocket: and (hopefully) using take if we decide to implement the "remove and return" variant, I really wish there was a different one-word name besides take as I can't help but read it as giving something to the set.

So while my heart is set on 👍, you can count me as a 🚀 if you want to move voting forward.

bradcray commented 2 years ago

OK, so unless @daviditen convinces the crowd it's wrong, I think we've converged on :heart: / :rocket: where I view them as being just more proactive vs. lazy versions of the same thing (i.e., "do it now" vs. "wait until someone wants/needs it"). Thanks for the discussion everyone!

mppf commented 2 years ago

Regarding pop -- in list we have pop with an argument and without. So pop(elt) would be the equivalent of the remove-and-return-removed-thing for set but pop() could just remove an arbitrary element and return it.