JuliaLang / Juleps

Julia Enhancement Proposals
Other
67 stars 24 forks source link

Find Julep: issue with sentinel values #47

Closed GregPlowman closed 6 years ago

GregPlowman commented 6 years ago

Originally posted as comment on commit, suggested to open issue instead. cc @nalimilan

Find Julep

Extract from section Issues Beyond the Scope of This Julep

Note that using first(linearindices(A))-1 as a sentinel value would be non-breaking for standard 1-based arrays. If A starts indexing at typemin(Int), then returned sentinel value would wrap-around to typemax(Int) (i.e. no error).

The calling code could then test for the sentinel value in the same way:

i = findfirst(!iszero, A)
if i != first(linearindices(A))-1
  ...
end

To simplify, could introduce new function sentinelindex, and use this consistently instead:

sentinelindex(A) = first(linearindices(A))-1
i = findfirst(!iszero, A)
if i != sentinelindex(A)
  ...
end
nalimilan commented 6 years ago

@timholy @StefanKarpinski Do you think that's something we should consider for 0.7?

ararslan commented 6 years ago

It would likely be a painful deprecation, but with efficient unions we could have findfirst return nothing when no match is found.

Sacha0 commented 6 years ago

Ref. https://github.com/JuliaLang/julia/pull/24673#issuecomment-346367983. Best!

nalimilan commented 6 years ago

@timholy Do you want this in 1.0? Now is the last chance to push for this change. ;-)

StefanKarpinski commented 6 years ago

It does seem like it may be a better approach.

timholy commented 6 years ago

Yes, it would be much better not to use 0 as a sentinel value. Returning nothing has a lot of merit because with existing code you'll get errors rather than wrong values.

nalimilan commented 6 years ago

@timholy Would you have time to make a pull request? I'm pretty swamped with my 1.0 items already (but I can help reviewing it).

BTW, something I realized is that for AbstractDict and other types whose indices/keys can be of type Nothing, returning nothing is ambiguous. I think that's fine since in most cases the key type does not contain Nothing, but we could support e.g. findfirst(Some, pred, dict), in which case the returned value would be Union{Some{T}, Nothing}.

timholy commented 6 years ago

Yes, now that I reached a fixed point on the broadcasting code, I'll try to get to it.

I haven't followed find-related issues. What's the meaning of the 3-argument variant of findfirst? Link?

nalimilan commented 6 years ago

Yes, now that I reached a fixed point on the broadcasting code, I'll try to get to it.

Great!

I haven't followed find-related issues. What's the meaning of the 3-argument variant of findfirst? Link?

It doesn't exist yet, that was just a proposal for how we could handle dictionaries whose key type includes Nothing. But that's for post-1.0 I think.

andyferris commented 6 years ago

So... I feel a little heretical - but isn't Nullable the exact thing you want for when you might expect not to get a valid return in some cases? Of course, there's similar approaches such as a pair of results or the Union{Some{T}, Nothing} approach. (Did Some end up in Base?)

It seems a little dodgy to not allow for certain keys for generic indexables in the generic case (there's also a sentinel Symbol used in Base). I think this is the kind of situation where you want to follow some solid software engineering practices, and use Some or Nullable.

StefanKarpinski commented 6 years ago

Perhaps, but that complicates the API considerably and it's hard to imagine a case where nothing would be a valid or useful index value for an array-like object. Sure, it could for something like a Dict where keys can be anything, but I think that designing these APIs for the most generic possible structure will make us crazy.

jrevels commented 6 years ago

In similar cases in my own code, I've taken to creating a singleton type for sentinel values (e.g. struct NotFound end). It's totally unambiguous, since the only intended use for the instance of the singleton type is as the sentinel value.

StefanKarpinski commented 6 years ago

It's only unambiguous because you don't use NotFound objects as keys. We can't guarantee that for Dicts for example.

vtjnash commented 6 years ago

I wonder if there's some parallel to get here, where I think we will want to eventually have:

get(Dict, Key, Default) = Value || Default
get(Dict, Key) = Value || throw(Error)
get?(Dict, Key) = Some(Value) || nothing
StefanKarpinski commented 6 years ago

I think there's a somewhat fundamental split between containers like arrays where the keys are known to be something simple like an Int where lookup is quite fast. Not only does one want a simpler sentinel value for such structures, but they're also the kinds of structures where you may just want to return the key since getting the associated value is quite efficient. For more complex structures, the key can be anything and lookup may be non-trivial. In such cases, not only might you want to distinguish nothing as a key from nothing as a sentinel, but you also may want to either return key, value pairs since lookup is expensive or return some kind of token object for which lookup is more efficient than for keys.

jrevels commented 6 years ago

It's only unambiguous because you don't use NotFound objects as keys. We can't guarantee that for Dicts for example.

I mean that it's unambiguous from an API standpoint - the singleton/sentinel type only has a single purpose/use case that it is explicitly defined for.

The ambiguity problem arises when a user would be using a specific value for their own purpose, and then your API claims that value as a sentinel value. It seems to me that a commonly punned-upon singleton value like nothing would be prone to this issue. If your API just defines its own singleton type for the sentinel value, however, this problem mostly goes away.

Of course, you can't force the user not to abuse your API's singleton/sentinel value, but IMO it'd be far less likely to get abused than something like nothing.

ararslan commented 6 years ago

Would it be incredibly annoying to deprecate the use of nothing as an index and require it to be Some(nothing)?

StefanKarpinski commented 6 years ago

I mean that it's unambiguous from an API standpoint - the singleton/sentinel type only has a single purpose/use case that it is explicitly defined for.

Yes, but it's not uncommon for very generic code to put completely arbitrary objects into dictionaries that may have been arrived at by non-explicit means like reflection on the program state. You cannot guarantee that a NotFound object will never be used as a key in such situations unless you disallow using it as a key (in which case you have to treat it as a special case in such generic code).

jrevels commented 6 years ago

it's not uncommon for very generic code to put completely arbitrary objects into dictionaries that may have been arrived at by non-explicit means like reflection on the program state.

This is a well-articulated point, and I agree that there could be sentinel-value-dependent bugs in generic code regardless of which sentinel value an API selects.

Generic code eventually boils down to type-dependent code, though, and at that point, it seems to me that a API-specific singleton sentinel value (e.g. NotFound()) is more likely than alternatives (e.g. nothing) to cause an error when misused, rather than silently take the wrong (EDIT: "unintended" is probably a better word) code path. Specifically, it seems that NotFound() would arrive at a MethodError earlier than nothing would, since there would likely be many more methods defined on Nothing than NotFound in general.

For the record, I'm not actually very opposed to nothing as a sentinel value - in practice, it'd probably be fine either way - I just thought the issue merited consideration.

timholy commented 6 years ago

Definitely merits consideration. But I think that no matter what you picked, you could get the "disallowed" value used as a key if we support enough container types. All it takes is this:

found_keys = OrderedSet()
i = findifrst(f, container)      # if i is NotFound() this will later lead to trouble
push!(found_keys, i)
...
for k in found_keys
    kindex = findfirst(iequal(k), found_keys)   # ambiguous
timholy commented 6 years ago

I wonder if there's some parallel to get here

In #25472 I initially tried adding an optional senintel argument to find* functions. Obviously the sentinel would have to be untyped. The thing that made me back off of that was the scary ambiguity situation. If we wanted to specialize findnext on, say CartesianIndex keys, having a sentinel greatly increases the number of stupid versions of the functions you have to add to resolve ambiguities.

jrevels commented 6 years ago

I don't think that - and am not trying to argue that - the chosen sentinel value should/could be disallowed as an element/key of arbitrary containers. I realize now that that's been the central point of contention here, so I should have clarified that earlier, my bad. That wasn't even the kind of situation I was thinking about.

Rather, I was thinking of the problems that arise in situations like:

result = findfirst(f, container)
if some_application_specific_predicate(result)
    do_something(result, container)
end 

where some_application_specific_predicate and do_something could contain generically written business logic. If a sentinel-value-related bug happened to be encountered internally to these functions, it seems likely that NotFound() would result in a MethodError more quickly/often than nothing would, since nothing has more uses/responsibilities than NotFound() does (i.e. has more methods directly defined on it).

nalimilan commented 6 years ago

If a sentinel-value-related bug happened to be encountered internally to these functions, it seems likely that NotFound() would result in a MethodError more quickly/often than nothing would, since nothing has more uses/responsibilities than NotFound() does (i.e. has more methods directly defined on it).

nothing is precisely used in situations where it must not propagate silently already, e.g. as the return value of match or tryparse when they fail. Very few methods are defined on ::Nothing for this reason. I think the only method which specifically accepts it is coalesce, which is specially designed for this case.

jrevels commented 6 years ago

True. I feel that it's not too uncommon for external packages to adopt nothing for their own sentinel value purposes, in which case that's where the ambiguities would likely come from, rather than from Base itself. However, I could easily be wrong on that hunch, and I don't feel like scouring METADATA for proof.

At this point, it seems as though the idea of an API-specific sentinel value has been given (perhaps more than) its fair due of consideration, so I happily concede to the nothing camp :)

nalimilan commented 6 years ago

Would it be incredibly annoying to deprecate the use of nothing as an index and require it to be Some(nothing)?

@ararslan I think so. People don't necessarily anticipate that their code could be passed nothing by a caller, possibly through several layers. It would be problematic to throw an error when nothing is encountered, as it could happen only at runtime, or when a user tries a very specific thing.

That said, the same problem would arise if we allow nothing as an index, as it could be misinterpreted as indicating the absence of a result, which cannot be detected before runtime. It seems that the only practical way of preventing this kind of problem would be to return Union{Some{K}, Nothing} for AbstractDict{K} where K >: Nothing. This could either happen automatically, or by throwing an error and requiring people to use a specific find* function/method.

nalimilan commented 6 years ago

I've just realized indexin uses 0 as a sentinel. I guess we should change it to use nothing too? It could also use the same index type as other find* functions for consistency.