Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Review zero and null values for Ranges #410

Open librasteve opened 9 months ago

librasteve commented 9 months ago

Currently, v.6.e.PREVIEW, we have the following:

say so 0..0;   # True
say so 1..0;    # False (not well formed)
say so Nil..Nil;   # prior version was True, now errors...

Range cannot be iterated over because its starting point 'Any' does not
have a '.succ' method
  in block <unit> at <unknown file> line 1
  in any <main> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1
  in any <entry> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1

My (naive) assertions are that:

0..0 should be reserved as the (only) 'zero Range' so that

say so 0..0;                     # False
say so (0..0 ~~ Range); # True

and that (if ever defined) a Range div Range operation could then fail with 0..0 as the denominator

and

Nil..Nil should be reserved as the (only) 'null Range' so that

say so Nil..Nil;                    # False
say so (Nil..Nil ~~ Range); # True

Since 6.e.PREVIEW is making a breaking change to the behaviour of Nil..Nil, I would suggest we try to squeeze in the latter as a fix prior to closing 6.e production...?

See this SO question for some context: https://stackoverflow.com/questions/77608044/is-there-an-elegant-way-in-raku-to-see-if-two-ranges-overlap

librasteve commented 9 months ago

Seems that use of Nil as an endpoint for Range is already problematic:

Welcome to Rakudo™ v2023.10. Implementing the Raku® Programming Language v6.d. Built on MoarVM version 2023.10.

Nil..0    #Nil..0

but

0..Nil

Use of Nil in numeric context
  in block <unit> at <unknown file> line 1
  in any <main> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1
  in any <entry> at /Users/stephenroe/.rakubrew/versions/moar-2023.10/bin/../share/perl6/runtime/perl6.moarvm line 1

So somewhere in the code there's an "implicit assumption" that Nil < 0

librasteve commented 9 months ago

So, a cleaner approach would be to define a (new) symbol for the null Range such as:

𝟘

𝟘 MATHEMATICAL DOUBLE-STRUCK DIGIT ZERO Unicode: U+1D7D8, UTF-8: F0 9D 9F 98

with texas variant like

Range()

Then "all" that needs to be done in core is:

librasteve commented 9 months ago

https://docs.raku.org/language/operators#Identity would also need to be updated

niner commented 9 months ago

I'm not following your reasoning. To me 0..0 is equivalent to a List containing a single 0. As such I expect it to be True as a non-empty List coerces to True.

gfldex commented 9 months ago

We can express an undefined Range already. It is (Range).

I do agree that ranges with undefined values (Nil, Type-Objects) should result in an undefined Range, so that any :D-type smily would triggers as expected. My Sprachgefühl (sorry, no english word available) tells me, Nil..Nil should turn into (Range) because Nil indicates that absense of a value but definite Range-objects are values. Thus, here is something deeper amiss. The construction of such an impossible Range object could be a compile time error, because the value is impossible. For the case that the object has to be contructed a runtime, things get even more problematic. Nil.Int warns and turns the absense of a value into 0. If we want consistency, we should preserve Nil and warn. So, Nil..Nil should iterate to Nil, have .so & .defined ~~ False and .elems == 0.

Ranges with type objects like Int..10, Any..Any should at least return (Range) and thus be undefined and not definite. I'm undecided if they should warn but leaning to, yes, they should.

I can't thinks of a case where the construction of an impossible Range would not be a bug. So fail with a distinct X::Range:Impossible would also be an option. That way we can produce a proper error message that contains both endpoints given to the constructor of Range to help debugging.

The whole thing is tricky to decide because Raku is ment to soldier on, even if odd values pop up in any container that is not guarded by :D.

raiph commented 9 months ago

What about:

duncand commented 9 months ago

I find that the best way to characterize an interval (a Range) is that it is a set of values such that the set is possibly of infinite cardinality, eg when you have 2 real numbers or 2 character strings as endpoints, in contrast with a typical set defined in terms of an enumeration of individual values and therefore is of finite cardinality. This is particularly true in the case of a set of intervals which can be non-continuous. And so, when considering what is reasonable behavior, we should try and define semantics for a Range which are logically consistent with semantics for a set.

duncand commented 9 months ago

Continuing on, for the generic case of a Range of Any...

  1. A Range with 2 closed endpoints that are the same value defines a set of exactly 1 value which is that common endpoint.
  2. A Range with 2 endpoints that are the same which is open at either end defines an empty set and/or is invalid.
  3. Nil is a single value in generic context like anything else, so Nil..Nil defines a set of exactly 1 value which is the value Nil, same as if you defined an Array or such with Nil as its sole element.
  4. If Raku defines the generic order-determination operator between different types, for example you can sort a list containing Foo and Bar objects and know deterministically that one will sort first without errors, it is likewise valid to define a Range with a Foo endpoint plus a Bar endpoint, since Range semantics are defined over regular ordering.
duncand commented 9 months ago

I feel that while it may not make sense conceptually to sort different types, eg it doesn't make sense to sort numbers and strings, if Raku at least provides a deterministic default generic sort that can compare two Any and give a defined result, then a Range can always function with any 2 end points no matter what they are. Same as generic equality should always be defined between any 2 things, returning a defined result, although arguably that is easier to argue for and implement than generic sorting. And if it doesn't make conceptual sense, the solution is to just not combine both types in the same Range, but if someone does it, it should "work".

librasteve commented 9 months ago

Some very good & thoughful comments - let me try and summarize my reaction (corrections to the are welcome if anything is misconstrued):

As a consumer of raku, I think the following makes most sense:

  1. On "zero Range"

(0..0).so #True <=== this

this is what we have today in 6.e.PREVIEW - phew!

rationale

  1. On "badly drawn Ranges"

(Range) is good (thanks @gfldex ! - I couldn't see the wood for the trees)

(Nil...Any) or (Any...Nil) or (Nil..Nil) returns Nil ie. Soldier on by propagation of Nil outwards (compile time or runtime)

(Any:U...Any:D) or (Any:D...Any:U) or (Any:U..Any:U) returns (Range) ie. Propagate undefinedness outwards

Implementors (see below) can use (Range) or its derivatives as the return value when no defined result exists.

I am not sure about warn.

Specifically on my Math::Interval module, this has the following effect:

> my $i1= (0..0).Interval             #0..0
> my $i2= (1..1).Interval              #1..1
> my $i3 = $i1 ∩ $i2                    #Set()               <== this is now wrong
> my $i3 = $i1 ∩ $i2                    #(Interval)        <== I will change to this
> (Interval) ~~ Range                 #True (since class Interval is Range)
2colours commented 9 months ago

I feel that while it may not make sense conceptually to sort different types, eg it doesn't make sense to sort numbers and strings

To reassure this: 1..'foo' wouldn't even work currently, as the type of the left side governs the understood type of the Range.

jubilatious1 commented 5 months ago

As I see it, ... triple-dot Sequences are currently broken: #407

Whatever is decided for Ranges, please table it until ... triple-dot Sequences are fixed. Otherwise we won't have a reliable indexing mechanism in Raku.

Thank you.

librasteve commented 5 months ago

@jubilatious1 ... I am relaxed about what the core dev team decides has priority, but I am surprised that comma separated triple-dot Sequences are the go to option for array indices - anyway that issue is the place to discuss more fully, I guess...