chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 420 forks source link

Should `set.remove()` return the removed element? #15819

Closed dlongnecke-cray closed 1 year ago

dlongnecke-cray commented 4 years ago

Currently set.remove() does not return the removed element at all, instead choosing to drop it on the floor. This choice was inspired by Python, where set.remove also returns None instead of the removed value.

Some of my reasoning for why Python's set.remove might return None instead of the removed element:

Chapel has value types (records) and reference types (classes) while Python only has reference types. In Chapel values added to a set are not required to be immutable. To achieve immutability without this requirement we make a copy of any element added to a set and only allow access to the set element by const ref.

In addition to copying added elements, because users can overload the == operator for their types, this can create a scenario where two instances of a type (such as a record) evaluate as equal but contain significant differences.

While one might argue that this is not overloading the == operator in good faith, that this can happen in practice suggests that set.remove() should return the removed value, -OR- at the very least we should add another method that returns the removed value such as removeAndReturn.

mppf commented 4 years ago

@dlongnecke-cray - I'm not sure if you expressed an opinion for the answer to "Should set.remove() return the removed element?" but my opinion is that the answer is "yes, it should".

dlongnecke-cray commented 4 years ago

Oh, good point.

I think set.remove() should return the removed element as well.

Do you know what happens to returned values that are not assigned in a multilocale context? Are they still potentially communicated across locales? This might be a separate issue at heart, but it might be reason enough to justify adding a method to set such as .drop() that removes the value without returning it.

mppf commented 4 years ago

Well, if set.remove contains an on statement, then yes, removing it on a different locale from the set would introduce communication. But most likely if somebody cares about performance the don't want the communication related to the on statement either...

dlongnecke-cray commented 4 years ago

I wrote the last sentence of my question wrong, edited for clarity ><.

bradcray commented 4 years ago

What will set.remove() return if it doesn't find the element?

dlongnecke-cray commented 4 years ago

Well historically (based on what we've done for other such methods), we'd have it halt. An alternative is to make it throw, though I think we've generally disapproved of such an approach. If we want to break precedent we use an out parameter and return a bool...

bradcray commented 4 years ago

That makes me think I might prefer to stick with the current interface (which, as I understand it, returns true or false depending on whether or not it was found?).

On my first quick read of this issue, it seemed odd to me that in saying mySet.remove(10) I'd care to have 10 returned back to me. I think I understand your OP's point to be that since (1) sets look for the element using == and (2) a given == may permit some sort of approximate match, (3) the value that's remove()d might be different in an interesting way than the thing I passed in. And while I can see that's the case, I'm pretty sure I'd rather get true or false back instead of taking a halt.

If we either used the out argument you propose (or maybe cleaner, returned a tuple), we'd need to have some way of creating a dummy value of eltType for any type used to create sets which seems iffy to me (and potentially expensive depending on the type).

If we had option types, we could have it return an option type value of course.

We could also consider having remove() remain as it is today and have a fetchRemove() or somesuch that would return the value in one of these forms if/when we wanted to add it.

dlongnecke-cray commented 4 years ago

I would prefer to introduce a method like .drop, which will retain the current semantics of .remove(), and then update .remove() to halt if the element is not contained in the set.

Though I like the above names a bit more than fetchRemove, your idea would avoid breaking existing code, so it's probably better.

bradcray commented 4 years ago

I worry about having something as fundamental as set.remove() halting in a parallel language. It seems like there'd be no way to safely remove elements in a parallel context because you'd never know whether someone else would slightly beat you to removing them.

Is there a precedent for .remove() returning the value in other value-based languages like C++?

dlongnecke-cray commented 4 years ago

It seems like there'd be no way to safely remove elements in a parallel context because you'd never know whether someone else would slightly beat you to removing them.

I view it as a problem also shared by list.pop and map.remove. It seems like parSafe=true should handle a data race nicely?

Is there a precedent for .remove() returning the value in other value-based languages like C++?

It looks like the C++ unordered_set type has erase, which returns 1 if the element was removed, so basically the exact same semantics as set.remove has today.

bradcray commented 4 years ago

It seems like parSafe=true should handle a data race nicely?

What does "nicely" mean if the routine halts, though, when the value you thought was there isn't because someone else just removed it?

I agree map.remove seems like it should share a similar design. list.pop() feels a little different because multiple things could pop simultaneously and both succeed... until the list becomes empty, I suppose.

dlongnecke-cray commented 4 years ago

Part of me wants to say that I feel like a more general locking mechanism for containers in the spirit of https://github.com/chapel-lang/chapel/issues/12306 could help solve this problem and others.

We could also revisit the idea of having set.remove throw an exception instead of halting once again (it seems like exceptions were designed to solve this sort of problem).

mppf commented 4 years ago

We could also revisit the idea of having set.remove throw an exception instead of halting once again (it seems like exceptions were designed to solve this sort of problem).

Halting for this seems untenable to me. Throwing would be fine, in my opinion.

If we had option types, we could have it return an option type value of course.

I'm pretty sure we could get an option type together in short order, if we really wanted to.

I agree map.remove seems like it should share a similar design.

I think the problem is worse for map.remove because with a map you'll remove based on the key but might want to get the value back.

bradcray commented 4 years ago

I'd still be inclined to leave remove() as-is on the basis of (a) not breaking existing code, (b) C++'s precedence and then adding a removeFetch() or remove(val, returnVal=true) or something overload if/when someone really wants this feature.

dlongnecke-cray commented 4 years ago

If we want to throw then there are a whole lot of other cases where we currently halt that we might want to revisit.

I'm fine with leaving things as is for now but I do think for set to be feature complete we should offer some variant of removeFetch, remove... (as Brad suggests) in the future.

bradcray commented 2 years ago

Some notes from a Set deep-dive meeting today:

Names for throwing routine that returns the value:

Names for routine that just returns a bool:

Noted that:

dlongnecke-cray commented 2 years ago

I prefer remove() or take() for the option that returns the value. I agreed with Brad about disliking take() at first, but the name has grown on me because I think the verb strongly describes the action taking place, moreso than remove(). (I can see how one might confuse that with saying "this set method takes() an item", though.) I am ok with discard() for the bool version that does not throw but would really love drop().

For the future version that may or may not return an option type I'd be ok with maybeRemove or maybeTake (on the assumption that we will name the option type Maybe).

proc ref remove(x: eltType): eltType throws;
proc ref drop(x: eltType): bool;
proc ref maybeRemove(x: eltType): Maybe(eltType); // Somewhere, in a galaxy one major version away
bradcray commented 2 years ago

For the future version that may or may not return an option type I'd be ok with maybeRemove or maybeTake (on the assumption that we will name the option type Maybe).

<crazy idea>Maybe we permit identifiers to end with ? and take the stylistic approach that routines that return option types end in ?, so remove?() for the option type version</crazy idea>

dlongnecke-cray commented 2 years ago

That is a really beautiful idea, and one of the reasons why it appeals to me is that I've always imagined nilability to just be a very focused application of option types. My secret hope is that one day we could unify the two so that nilability is nothing more than syntactic sugar for Maybe(SomeClass), but I don't know how feasible that would be.

dlongnecke-cray commented 2 years ago

I do worry that we might be overburdening the meaning/disambiguation of the ? token if we were to do that, which is IMO the biggest reason why that may not be a good idea. Another reason is it's essentially relying on convention (I mistakenly interpreted your idea as meaning a proc name suffixed with ? would automatically have its return type transformed into an option type), and I thought we were trying to get away from that and relying more on language patterns (the last discussion I recall being about e.g. sync lock identifiers suffixed with $).

bradcray commented 2 years ago

relying on convention..., and I thought we were trying to get away from that and relying more on language patterns (the last discussion I recall being about e.g. sync lock identifiers suffixed with $).

I don't think we're anti-convention in our standard libraries (which is why we're putting in effort trying to unify on capitalization conventions, argument names, etc.). The issue with $ was that it was a convention designed to protect users from deadlocks by making reads/writes of synchronization variables, which was a red flag that maybe the language design should not have made that so invisible that the convention was necessary to understand the code (where the current requirement to use a .readXY()/.writeXY() variant addresses that).

In contrast, in proposing this, I'm trying to come up with a convention which would pack some attractive options into a restricted namespace without requiring someone's favorite version of the routine to become to verbose / ugly (e.g., haltingRemove(), throwingRemove(), or optionRemove()). I don't think this is actually workable (because ! already has too many meanings to be included in user identifiers), but I sometimes wish we could have:

mppf commented 2 years ago

I like the idea of mySet.remove?() but I think we might have parser challenges with it if applied broadly (i.e. any identifier can end in ?), because it could mean ~two~ three things:

I am not saying these are impossible to support at the same time - just that it is not already obvious to me how hard this would be in the parser.

The remove?() makes sense to me as a way to indicate it returns an option type. But I don't think that a halting version should be in the API design at all as an alternative, so I wouldn't want remove!().

Also, I am worried that combining with optional chaining (see #17577) will be confusing. Say you wanted to call a method on the removed element if there was one, with optional chaining. You would write mySet.remove?()?.someMethod() which is maybe too much...

bradcray commented 2 years ago

because it could mean two three things:

Good point. I wasn't thinking carefully and was only thinking of ? being supported as a type query type of symbol and forgetting about its (still relatively new) role as a nilable class type specifier.

lydia-duncan commented 1 year ago

Given the decision on #18649 to keep remove as is and support a different method for removing and returning, I'm closing this issue