Raku / problem-solving

🦋 Problem Solving, a repo for handling problems that require review, deliberation and possibly debate
Artistic License 2.0
70 stars 16 forks source link

Behavior of type-constrained parameters is surprising given other behavior #426

Open landyacht opened 7 months ago

landyacht commented 7 months ago

Minimal Example

sub foo(Int @a) { dd @a; }
foo [1, 2, 3];
# Type check failed in binding to parameter '@a'; expected Positional[Int] but got Array ([1, 2, 3])

Why This is Surprising

Other behavior in Raku would lead one to believe that type constraints apply not to the container (variable) as a whole, but rather to the element(s) that can be put into our pulled out of the container, the exact meaning of which depends on the type of said container.

Specifically, we do not constrain an array to only hold Ints with Array[Int] @a, rather we do Int @a. We do not constrain a hash to hold only Ints with Hash[Int] %h, rather we do Int %h. We do not constrain a callable to return only Ints with Callable[Int] &c, rather we do Int &c. All this implies to the newly-learning Raku developer that constraints apply "intelligently" based on container shape, rather than "dumbly" to the container as a whole. In other words, the user doesn't have to worry about typing whole containers as long as the contained elements satisfy the constraint.

This is further reinforced by the fact that literals with no explicit type specified can be assigned into type-constrained containers, e.g. my Int @ints = [1, 2, 3]. Nowhere does the programmer explicitly say that [1, 2, 3] is Array[Int], yet the language accepts it into an Int-constrained @-sigiled container. Why, then, should the Minimal Example fail?

Should This Behavior Change?

More experienced developers understand that the underlying difference is that between assignment (=) and binding (:=), and what happens when calling a code block is binding to parameters, not assigning. While that suffices as a technical explanation, it does not feel sufficient as a philosophical justification for the surprising behavior. I (and I believe others too) would like to be able to apply the "intelligent constraint" principle consistently, or at least in the absence of something "unusual" like my @a is Array[Foo].

Furthermore, it seems to violate the concept of optional, gradual typing. Going back and adding type constraints to your quickly-prototyped code should not break it if that constraint was always satisfied anyway. It feels wrong that the Minimal Example could be made to work, with equivalent function, by removing the type constraint on foo's parameter.

Practical Considerations

Making parameter binding behave as naĂŻvely expected is easy enough for cases where the container's values are all known at compile time. However, in the general case, we would likely have to scan through arrays and hashes, and I'm not even sure what sort of black magic would be necessary for callables (perhaps those are an acceptable exception where the user must explicitly call out the return type).

librasteve commented 7 months ago

here's my proposal:

let's extend the way that the array literal works

[1,2,3].Int gives Array[Int]

sub foo(Int @a) { dd @a; }
foo [1, 2, 3].Int;       # works
foo [1, 2, 3];            # fails

also we would need

my @a = [1,2,3].Int;
@a ~~ Array[Int];    #True

... then we will need to update the docs and error messages so that this is obvious to a newbie


Edit:

and the hash literals {:a(1),:b(2),:c(3)]}.Int gives Hash[Int] %(:a(1),:b(2),:c(3)]).Int gives Hash[Int]

landyacht commented 7 months ago

@librasteve couldn't that break code relying on the convention that the numification of an array is its number of elements? Granted, numification and intification are two different things, but I think it would be surprising for +[4, 5, 6] to yield 3 while [4, 5, 6].Int yields Array[Int].new(4, 5, 6)

librasteve commented 7 months ago

yes - I don't see anything here that could be done non-breaking change

the benefit of my proposal is that this is only for the Array and Hash literals - while there possibly is code out there with [1,2,3].Int and so on, I don't see that idiom as being widely used

would definitely go for a non-breaking change if one can be found

niner commented 7 months ago

Sigils are shortcuts for type constraints (and default values). @ means Positional (default value Array). Int @ means Positional[Int] (default value Array[Int]). They are roughly equivalent. Thus sub foo(Int @a) means pretty much the same as sub foo(Positional[Int] $a). Thus while at first glance you may think that Int @a is a type constraint on an array's values, it really is a type constraint on the array variable itself. It means that @a holds an object that does the Positional role, parameterized by Int or in other words: an array-like thing that by definition can only hold Ints. We thus can rely on that object to have done any appropriate type checks on its members when they get stored into it. I.e. I don't have to check whether a Positional[Int] really only contains Ints.

An important difference between assignment and binding is that when assigning containers, we always use copy-semantics. Remember, we put the responsibility of the type checks on the container when elements are stored and my Int @a = [...] is really a call to @a.STORE([...]). Thus @a has to type-check all supplied values. The flip side is that for copying we have to touch all those elements anyway and thus the type checking is almost free.

Binding on the other hand is performance sensitive. You probably and I for sure wouldn't want Raku to iterate over all arrays passed to any function. That would be hugely expensive. Thus we have to rely on the container type to avoid that cost.

Lastly if we want to extend the simplification we have for variable initializers to routine calls, that would have severe restrictions. We would only be able to do this for subs, not method calls as the latter are late-bound, i.e. we cannot know at compile time, which code object will be called for that method. Thus we cannot know its signature and will not know whether we need to apply the special case or not. This will fix one surprise by introducing another. Suddenly something that works just fine for subs will fail when you turn that sub into a method.

As to the suggestion of adding special method-like syntax for postfix type-specifications: the example with Int doesn't look too bad and you already acknowledged that this would be a breaking change as .Int right now would give you the number of elements. However you would have to do this not just for Int but for any type, at least those types that can be used for literals. [1, 2, 3].Bool would suddenly no longer return True. Even worse, [1, 2, 3].Str would no longer be 1 2 3. Except of course when you first put it into a variable and then call .Str which is even more surprising.

Compared to all that, is Array[Int](1, 2, 3) really all that bad?

landyacht commented 7 months ago

I realize that from a technical standpoint, everything you said in the first paragraph is true, but something about it just doesn't sit right with me conceptually. Maybe it's because I'm still clutching my Perls, so to speak, but it seems like sigils ought to be more than pure syntax sugar. Regardless, being unable to pass what is clearly an array/hash/etc. of Foos into a routine that says it takes an array/hash/etc. of Foos is unexpected, frustrating, and discourages use of the optional typing feature.

Would type-checking each element be the end of the world performance-wise? I genuinely don't know, but I'm sure there are optimizations that could help skip such a check in many cases (e.g. containers can keep up with what's been put in them, to whatever degree of accuracy yields the best benefit-to-annoyance ratio). If performance becomes an issue, the user can specify types on the calling side to avoid that iterative check. This is what we do with decimal numbers - the additional performance of floating points is opt-in because rationals DWIM better.

I'm not sure I follow on the issue of special-casing for subs vs. methods. Why does this need to be a special case? Wouldn't making it so that e.g. [1, 2, 3] ~~ Array[Int] be sufficient? Already we can't even know which sub candidate will be called at compile time because of where clauses and subsets.

librasteve commented 7 months ago

@niner - thanks for your feedback, I can see that my proposal will not fly

TIL that Array[Int](1,2,3) is a thing

I am now drawn to [1,2,3].typed as a synonym for Array[*.are](1,2,3) [pseudocode]

gfldex commented 6 months ago

There is a problem with binding and assignment with @ and %-sigiled containers. Please consider the following code.,

sub foo(Int() @ints) { dd @ints };
foo [1,2,3];

This should DWIM but is NYI.

In Raku we use type constraints to limit the consideration of the implementor of a Routine. This is not the default because it limits composebility. And we don't like that. I too made the observation that Raku-beginners don't understand the syntax of type constraints on collection containers. Most of them appear to reuse lessens they learned with statically typed languages. Raku does not want to be one of those languages.

Indeed, type constraints on collections where ment to allow optimisations. Rakudo does not take advantage of that right now. But be careful what you wish for. Modern C++ provides excellent support for typed containers. If you consider having to spend 3 lines of code to specify a single arguments type as excellent.

landyacht commented 6 months ago

@gfldex could you explain what you mean by limiting composability?

I'm also not sure how replacing Int with Int() changes anything related to this problem. If the compiler rejects passing an untyped array/list consisting of Ints into a Positional[Int] parameter, why would it allow passing it into a Positional[Int()] parameter? I would expect putting a coercion type on an @-sigiled parameter to allow individual elements to coerce, which is not the same thing as truly restricting the type of each element, nor the same thing as having the whole container coerce (i.e. Positional[Int]()). In other words, I would expect the following...

multi foo(Int @bar) { say 'Candidate I'; }
multi foo(Int() @bar) { say 'Candidate II'; }
multi foo(@bar) { say 'Candidate III'; }

class NoIntCoerce { }

foo (1, 2, 3); # Candidate I
foo ('1', '2', '3'); # Candidate II (Strs coerce to Ints)
foo (NoIntCoerce.new xx 3); # Candidate III
dontlaugh commented 6 months ago

Regardless, being unable to pass what is clearly an array/hash/etc. of Foos into a routine that says it takes an array/hash/etc. of Foos is unexpected, frustrating, and discourages use of the optional typing feature.

I am actually pretty n00b at using Raku's type constraints, but I've spent most of my life in statically typed languages like Go and Rust. I have experienced this "discouragement" first hand, maybe even with an example exactly like this.

I want to be able to start with something untyped and change the function signature later on to add constraints. Here is a contrived TypeScript example.

// Untyped
function sumTheNums(nums) {
    let sum = 0;
    for (let i = 0; i < nums.length; i++) {
        sum += nums[i];
    }
    return sum;
}

// Okay it's an array of numbers
function sumTheNums2(nums: Array<number>) {
    return sumTheNums(nums);
}

// Well, sometimes i want to concatenate strings for some reason, too
function sumTheNums3(nums: Array<number | string>) {
    return sumTheNums(nums);
}

let result = sumTheNums( [1, 2, 3, 4]);
    result = sumTheNums2([1, 2, 3, 4]);
    result = sumTheNums3(['1', '2', '3', '4']);
console.log(result);

I don't want to change the call sites. There could be a lot of those.

Anyways, to me it feels like the original example should work

 expected Positional[Int] but got Array ([1, 2, 3]).

If the docs are up to date, then Array is List and List does Positional. I don't know if this is how roles are supposed to work, but to me - coming from these other languages - the Array is a "narrower" type and we should be able to pass it in places like this.

alabamenhu commented 6 months ago

While ideally the

sub foo(Int() @bar) { ... }

will get implemented in a way to allow it to be called with foo [1,2,3], an alternative in the meantime could potentially be an (abuse) of traits. No doubt it hasn't been implemented yet because exactly how it will function is still a bit tbd. I know at one point in time I had done something akin to

multi sub foo( Int @bar ) { ... }
multi sub foo( @bar where *.all ~~ Int ) { 
    samewith Array[Int].new: @bar 
} 

The advantage of the multi approach is you get teh speedy if you've properly typed the container, and fall back to molasses if not but without breakage (and the thing would be tightened from the rest of the call chain). I'd imagine via traits somehow we could manage to capture such instances of containers with typed elements and rinse and repeat in someway. Maybe I'll play around with that.

librasteve commented 6 months ago

@alabamenhu , I think what you are saying is we check items by default and require a trait to vouch that the checking has been delegated, like this:

multi sub foo( Int @bar is vouched ) {                         #<== same as :( Array[Int] $bar )
    ... 
}
multi sub foo( Int @bar ) {                                   #<== used to be :( @bar where *.all ~~ Int )
    samewith Array[Int].new: @bar 
} 
multi sub foo( Int() @bar ) {                                #<== same as  :( @bar where ! *.all ~~ Int )
    samewith @bar 
}

if so, I support it

jubilatious1 commented 6 months ago

An array cannot be constrained to Int in Raku.

AUTHOR Mea Culpa: @raiph refutes the above assertion and he appears to be correct. Early on I tried 'Typing' arrays constrained to Int and failed, so I assumed it couldn't be done (maybe I tripped across the same issue as @landyacht ?). Cheers.

TL;DR Summary of vector/Types in the R-programming language, below:

In other languages, strict Type-specificity applies. For example, the R-programming language has no scalars but only vectors (what other languages call a scalar is simply a vector of length == 1 in R).

However vectors in R MUST consist of a single Type, i.e. all character (i.e. string) elements, or all Integer elements, or all Numeric (i.e. Real) elements, or all Factors (memory-saving mechanism in R for dealing with Grouped variables). But never a mixture of Types (sometimes referred to as mode() or storage.mode() in R).

This is consistent with R's use as a statistical language: you'll have columns of experimental parameters (e.g. name, date, age) and columns of data (i.e. numeric). A user may manually convert a vector with something like as.numeric and/or as. character. To facilitate this mechanism, there are canonical coercion mechanisms which are invoked automatically, for example when a string is combined/added-to a vector of Ints:

https://www.r-bloggers.com/2013/09/type-conversion-and-you-or-and-r/

[ If you need a mixture of Types in your data-object, you advance to the next-most-complex data-object, which is a List in the R-programming language. In fact, two-dimensional dataframes in R are nothing more than Lists constrained to consist of equal-length vector elements ( i.e. rows with equal number of columns) ].


Raku decided that Arrays can contain multiple Types. So you need to either:

  1. Set up a mechanism whereby all array elements can be constrained to a single Type (e.g. Int), OR
  2. Set up a mechanism whereby positions can be "mapped-over" (e.g. using rotor/batch) with a user-supplied list of Types. For example a repeating pattern of Int, Str, Num can map-over repeating segments of @a array (analogous to a three column table). OR
  3. BOTH.

@thoughtstream @pmichaud @TimToady

landyacht commented 6 months ago

I personally like librasteve's latest proposal. IMO, the Raku Way is that user convenience should be default, and performance that interferes with convenience should be opt-in. The is vouched (I'm not sure about the verbiage, but I can't think of anything better of a reasonable length) trait providing a way to still have the @ sigil on something that would otherwise be a Positional inside a Scalar container is something I hadn't thought of, and I definitely agree with the underlying implication that having a typed Scalar is the right way to indicate the user's intent for stricter typing (i.e. to get what is currently the behavior of Type @foo).

Regarding performance, I believe we can cover the vast majority of likely cases with some relatively simple optimizations. The trickiest part would be a good "tightest common type" operator that handles smilies (:D, :U), special types like Failure, and the fact that types can do multiple roles. Maybe two related operators—one to walk up the class chain (returning a single value) and another up the role chain (returning one or more)—would be better. At any rate, my idea is that containers like Array, List, Hash, etc. can keep up with what's been put in them; or, more specifically, the tightest common type of what's been put in so far.

The above approach alone would lose specificity (but not correctness) if elements get removed, but I think we could partially combat that while staying in O(1) space by tracking a bit more information. I won't overplay my hand here, though...

The odds of something being both a large and highly heterogeneous array are very small IMO, so I think this relatively simple optimization will go a long way to preventing full scans of giant arrays and hashes.

jubilatious1 commented 6 months ago

@landyacht I appreciate your commentary on Typescript. It's important to look at other languages to see what's currently resonating with programmers.

Someone named Larry Wall stated (paraphrasing) with regards to the Perl6/Raku re-write, "We've decided that it's better to change the language than change the user."

raiph commented 6 months ago

@jubilatious1

An array cannot be constrained to Int in Raku.

That's... quite a misunderstanding, to put it mildly, and needs to be addressed in case anyone who doesn't know Raku thinks your statement is correct. I'll try to ground understanding via code anyone can run to see that Raku very definitely allows arrays to be constrained to Int:

# Here's an @array whose elements are constrained to `Int`:
my Int @IntArray;

# Readers of the code know it, but so does Raku:
say @IntArray .of; # (Int)

# Here's an _assignment_ throwing an exception due to violating the constraint:
(try @IntArray = 1, 2, '3') // say $!; # Error ... expected Int but got Str ("3")

# Here's an assignment _succeeding_ by adhering to the constraint:
(try @IntArray = 1, 2, 3) andthen .say; # [1 2 3]

# Here's a function with taking an array whose elements are constrained to `Int`:
sub foo (Int @IntArray) { 'successfully bound' }

# Here's a _binding_ that succeeds by adhering to the constraint:
(try foo @IntArray) andthen .say; # successfully bound

# Here's a _binding_ that throws an exception due to violating the constraint:
(try foo [1,2,3]) // say $!; # expected Positional[Int] but got Array ([1, 2, 3])

Presumably all of this code has surprised you but it's been as above since day one.

PS. Imo the error message in the last one gets to the heart of what is really great about Raku related to this, and what sucks.

jubilatious1 commented 6 months ago

@raiph states:

That's... quite a misunderstanding ... .

Not sure then, why the example of @librasteve fails?

sub foo(Int @a) { dd @a; }
foo [1, 2, 3].Int;       # works
foo [1, 2, 3];            # fails

Actually, I see both of @librasteve 's examples failing, but maybe because I'm on an older Rakudo (2023.05)?

~$ raku -e 'sub foo(Int @a) { dd @a; }; foo [1, 2, 3].Int;'
Type check failed in binding to parameter '@a'; expected Positional[Int] but got Int (3)
  in sub foo at -e line 1
  in block <unit> at -e line 1
librasteve commented 6 months ago

hi @jubilatious1

today, this is a proposal, and it does not work

sub foo(Int @a) { dd @a; }
foo [1, 2, 3].Int;          # fails
foo [1, 2, 3];            # fails

My first proposal was to add syntax so that .Int on an Array will coerce it to an Array[Int]. I did not say what to do about the elems, but presumably this would scan the Array and coerce each elem to Int or fail. I still quite like this proposal, but it is unrelated to my latest proposal ;-)

It is interesting to hear you description of R - since imo raku is trying to do all the things, it can have Array[Int] to strictly control the contents of your array and it can have Array which is a dynamic ragbag of any type you like. Array is a Positional set of Scalar containers, so in the dynamic case, each element's container knows the type of its contents (if any). In a typed Array all the types of all the Scalar containers of all the elements must match the Array type.

SO, I think the debate is about when and where the conformance of the Scalar container elements is checked, Today that is when elements are added to the Array, and we trust a typed Array like Array[Int] to police its elements and only allow Int elements in. Which, as I understand it, is pretty R-like.

I think that Rog is saying that we should also allow an untyped Array to be passed to a sub with a signature like :( Int @a ) provided that its elements are all Ints at that time (ie a runtime check).

librasteve commented 6 months ago

@landyacht --- yeah, maybe is delegated beats is vouched verbiage-wise

jubilatious1 commented 6 months ago

@librasteve thanks.

This doesn't work either, although your DWIM and my DWIM may not be in full agreement:

~$ raku -e 'sub foo( @a[Int,Str] ) { dd @a; }; foo [1, "A"];'
Constraint type check failed in binding to parameter '@a'; expected anonymous constraint to be met but got Array ([1, "A"])
  in sub foo at -e line 1
  in block <unit> at -e line 1
librasteve commented 6 months ago
sub foo( Int $i, Str $s ) { 'yo' };
foo |[1, "A"];   #yo

^^ to start with , this does work (ie a signature can be defined to decompose an array and control the types)

at the moment, raku does not consider repeating patterns of types in a signature (afaik)

but we do have itemization and multi-dimensional arrays, which feature prominently and often folks ask why this "overhead" --- that mechanism is there to recursively unpack patterns in any hierarchy of list or map or item

FCO commented 6 months ago

This also works:

sub foo( [ Int $i, Str $s ] ) { 'yo' };
foo [1, "A"];   #yo
FCO commented 6 months ago

sub foo( @a[Int,Str] ) { dd @a; }; foo [1, "A"]

If you add a space between the @a and the [...], you kinda get what you want (if I got it right) https://glot.io/snippets/gvymdn83cz

But I also think maybe it would make sense to extend Lizmat's Tuple to work like that (Tuple[Int, Str]). But I'm only thinking out loud...

jubilatious1 commented 6 months ago

Thanks @FCO, for trying it out. Really...I was only off-by-one-space? Amazing.

Also, I tried doing Capture and Signature to get more info, but no dice:

~$ raku -e 'sub foo( Int $i, Str $s ) { dd .Capture }; foo |[1, "A"];'
\()
~$ raku -e 'sub foo( Int $i, Str $s ) { dd .Signature }; foo |[1, "A"];'
No such method 'Signature' for invocant of type 'Any'
  in sub foo at -e line 1
  in block <unit> at -e line 1

~$ raku -e 'sub foo( Int $i, Str $s ) { dd .Signature }; foo |[1, "A"];'
No such method 'Signature' for invocant of type 'Any'
  in sub foo at -e line 1
  in block <unit> at -e line 1

~$ raku -e 'sub foo( Int $i, Str $s ) { say .Capture }; foo |[1, "A"];'
\()
jubilatious1 commented 6 months ago

Possibly relevant:

"Inconsistensy of container descriptor default value type for nominalizable types." #3

"Make subsets validate against their constraints same way as definites do." https://github.com/rakudo/rakudo/pull/2946

gfldex commented 6 months ago

The more I think about this, the more this feels like an ENODOC instead of an ENODESIGN.

    my Int @a = Array[Int].new: 1,2,3,Int;
    dd @a;
    my @b := Array[Int].new: 1,2,3;
    dd @b;
    sub foo(Int @c, Int @d) { dd @c }
    foo @a, @b;

This just works and is quite readable.

However, there is a Rakudobug because the following also works.

    my Int @a = Array[Int].new: 1,2,3,Int;
    sub foo(Int:D @b) { }
    foo @a;
vrurg commented 6 months ago

Whenever I need to pass in a typed array what I usually do is:

foo( my Int @ = [1,2,3] );

It's a bit cumbersome, but considering that it is needed once in thousands of lines of code – no big deal with it. But I do understand that it could be used more often with math algorithms.

Let's have a look at this from another perspective. I didn't have time to thoroughly read through the entire discussion here, but so far nobody mentioned hashes. And, yet, the same WAT exists for them too. Declaring a typed hash is even bit more complicated, than declaring an array.

Interesting that strictly typed languages were mentioned but nobody thought about as syntax as a possible option. It's an idea I've got a few minutes ago therefore it's very raw and poorly thought-out. But something like might be an option:

foo [1,2] as Array[Int];
foo [1,2] as Int @;
bar { 12 => 1.2, 43 => 4 } as Hash[Rat, Int];
bar { 12 => 1.2, 43 => 4 } as Rat %{Int}; 

What's good about it is that it allows the compiler to create correct constant object at compile time, avoiding run-time overheads. For non-constant objects it would wind down to a run-time coercion case which is less interesting and only makes sense for providing DWIM behaivor.

Apparently, the syntax would also allow to avoid run-time coercion with any other, potentially constant, object.

jubilatious1 commented 6 months ago

@raiph:

my Int @a;
@a.push: "cat";
say @a; 

# Errors: 
Type check failed for an element of @a; expected Int but got Str ("cat")
  in block <unit> at <unknown file> line 1

But:

my Int @b;
@b.push: Int;
say @b;

# Returns:
[(Int)]

my Int @c; 
@c.push: Int;
@c.push: Nil;
say @c;

# Returns:
[(Int) (Int)]

I guess that makes sense, considering the design of the language. A pushed Nil will be coerced to a placeholder for an Int.

But back-up a second: why are multi-element objects allowed to accept a single Type element at the head? In other languages there's a distinction between atomic/primitive, as opposed to aggregates of such atomic/primitive elements.

Not so in Perl/Raku?

lizmat commented 6 months ago

The Int constraint accepts both type objects as well as instances. Perhaps you meant:

$ raku -e 'my Int:D @b; @b.push: Int'
Type check failed for an element of @b; expected Int:D but got Int (Int) (perhaps Nil was assigned to a :D which had no default?)

Putting the value Nil in a container, will assume the default value for that container. Which in the case of my Int @a would be Int. But it can also be more precise:

$ raku -e 'my Int:D @b is default(42); @b.push: Nil; dd @b'
Int:D @b = Array[Int:D].new(42)
raiph commented 6 months ago

@jubilatious1

why are multi-element objects allowed to accept a single Type element at the head? ... In other languages there's a distinction between atomic/primitive, as opposed to aggregates of such atomic/primitive elements.

Not so in Perl/Raku?

It's the same in Raku as it is in other languages.

As liz has explained, an Int is an Int but there's the distinction between "instance objects" and "type objects". Many PLs use precisely the same distinction using precisely the same terminology; Python, for example.

alabamenhu commented 6 months ago

Interesting that strictly typed languages were mentioned but nobody thought about as syntax as a possible option. It's an idea I've got a few minutes ago therefore it's very raw and poorly thought-out. But something like might be an option:

foo [1,2] as Array[Int];
foo [1,2] as Int @;
bar { 12 => 1.2, 43 => 4 } as Hash[Rat, Int];
bar { 12 => 1.2, 43 => 4 } as Rat %{Int}; 

What's good about it is that it allows the compiler to create correct constant object at compile time, avoiding run-time overheads. For non-constant objects it would wind down to a run-time coercion case which is less interesting and only makes sense for providing DWIM behaivor.

Apparently, the syntax would also allow to avoid run-time coercion with any other, potentially constant, object.

Actually, even better than this, this can be done today, in a module.

multi sub infix:<as>(Positional \source, Positional:U \target) {
    return source ~~ target ?? source !! target.new(source);
    CATCH { 
        die "Cannot coerce {source.WHAT.^name} into {target.WHAT.^name} using 'as'.\n"
          ~ "Perhaps one element didn't match the element type?";
    }
}

foo [1,2,3]; # Type check failed in binding to parameter '@a'; expected Positional[Int] but got Array ($[1, 2, 3])
foo [1,2,3] as Array[Int]; # works
foo [1,2,'c']; # Cannot coerce Array into Array[Int] using 'as'. Perhaps one element didn't match the element type?

I wrote that up in like two minutes. Obviously, more care would be needed for nested types like Array[Array[Int]] or for Associative types, but still, would not require a lot of work. This could be seen as a more "English like" way to do coercion, which has historically been a method call.

dontlaugh commented 6 months ago

I don't know much about Rakudo internals, or type theory for that matter. So I'm just going off vibes. The vibe I get is that there's a real problem that this doesn't work.

role Photosynthesis {}
class Plant does Photosynthesis {}
class Turnip is Plant {}

sub harvest(Photosynthesis $one) {}
sub harvest-many(Photosynthesis @many) {}

sub MAIN {

  my Turnip $turnip;
  my Plant $tree;
  harvest($turnip); 
  harvest($tree);

  my Photosynthesis @garden := Array[Plant].new: [$turnip, $tree];
  harvest-many(@garden); # works

  my $forest := Array[Plant].new: [$turnip, $tree];
  harvest-many($forest); # works

  my @potted-fern := [$turnip, $tree];
  harvest-many(@potted-fern); # fails (the original example)
}

Again, the vibe I get is that passing @potted-fern should be correct. And I don't really like the idea of inventing syntax (or adding methods) to compensate for the fact that we're rejecting a correct program.

@niner wrote:

Sigils are shortcuts for type constraints (and default values). @ means Positional (default value Array). Int @ means Positional[Int] (default value Array[Int]). They are roughly equivalent. Thus sub foo(Int @a) means pretty much the same as sub foo(Positional[Int] $a). Thus while at first glance you may think that Int @a is a type constraint on an array's values, it really is a type constraint on the array variable itself. It means that @a holds an object that does the Positional role, parameterized by Int or in other words: an array-like thing that by definition can only hold Ints. We thus can rely on that object to have done any appropriate type checks on its members when they get stored into it. I.e. I don't have to check whether a Positional[Int] really only contains Ints.

The way I read this is: the program should work, but there are some implementation issues that make it difficult, perhaps really difficult.

But the core issue remains: should it work?

niner commented 6 months ago

No it should not. A type constraint of Int @a means "I accept a Positional[Int]`. Or less accurate but hopefully easier to understand: "I accept an Array that can only hold Ints". It does not mean "I accept an array that only holds Ints". It is prescriptive rather then descriptive.

People would like it to be descriptive as that would be more convenient with literals. It would save 10 characters of typing. In the Int case at least. I do understand that wish. In CompUnit::Repository I kinda would have liked that as well.

However language design is always a compromise. In this case convenience clashes with type theory and performance. And it's not just theory. Let's look at the proposed solutions which generally fall into one of three categories:

If we change the type constraint to be descriptive, i.e. it will accept any arrays that hold values that fit the element-type-constraint then the following code would continue without any errors:

foo([1, 2, 3]);
sub foo(Int @int-array) {
    @int-array[0] = "I'm definitely not an Int!";
}

This would "work" because we told the compiler that the function should check that the input array only holds Ints. It does, so the array is accepted. However, the array is not actually typed. It's still a generic array that can hold anything! How's that for a WAT?

Ok, so what about having the compiler check and convert the array automatically then? Changing the array type would fix this WAT. But then what about this code?

my @generic-array = 1, 2, 3; # This definitely can hold anything.
sub foo(Int @int-array) {
}
foo(@generic-array);
@generic-array.push: "harmless string"; # Boom! What? Why can't I put arbitrary things into my generic array anymore?

Call me old fashioned but this doesn't look any better to me. An object's type should not change like that (yes, we have mixins, but those don't change the base type). This would be very surprising and cumbersome to use. But at least we can fix this. After all we can simply create a new Array[Int]-object that's just initialized with the values passed to the function, right? Then the type change would only be effective within the function itself (as no type changes, we have a new object). Except, if arrays are always copied like that, how would you implement something like push @a, 1? Changes to the array would suddenly no longer be visible outside that function.

So that leaves us with manual coercion. This does away with all the magic and its side-effects. It's explicit, predictable and as @alabamenhu showed rather easy to implement even. I'd just wonder what the advantage of [1, 2, 3] as Array[Int] over Array[Int](1, 2, 3) is.

Btw. I got curious and wondert how Scala deals with this problem. I have become quite fond of that language as it's statically typed but still has a somewhat Rakuish feel to it. The answer is: type inference. Scala infers heavily and Array(1, 2, 3) will result in an Array[Int](1, 2, 3). So would that be an answer? Unfortunately not either. Reason is simply that Scala does not even have "anything goes" Arrays like Raku has. All arrays are typed in Scala. The closest you can come is Array[Any]. Currently Raku does not do type inference at all. Sounds outdated, but maybe we're actually ahead of the curve in this. Introducing type inference just for containers would be really weird and cause all sorts of surprises. Going beyond that would totally change the language. Maybe for the better, but then it's soooo easy to screw up language design.

As for language design: whenever you propose any new feature, change, whatever, 90 % of your energy should go into finding out why it cannot work. That's actually good advise with code in general, but even more important when you design APIs and again that much more important when you design the language itself as it's really, really hard to take something back once you notice that the idea wasn't so great after all.

The reason why Raku has become such a nice language is that for 15 years people have shot each others proposals down. What's in Raku now is basically what survived all these people trying to find the weird consequences of the ideas that were put forward. Even so there are plenty of bits that I personally think still create more hassle than they are worth it (re-writing the compiler front-end is a great way to discover those). But it's close to impossible to remove them, because someone's using them and someone really likes them.

hwayne commented 6 months ago

I don't know what the right solution here is, but there's one big problem I have with the current state that I don't think I've seen anybody bring up yet.

As a raku novice, I've made this exact mistake of passing a literal array into a Int @a method. I had no idea what the error meant: why the heck wasn't Array ([1, 2, 3]) being accepted as a Positional[Int]? I wasn't able to figure out what I was doing wrong; in the end I just gave up and removed the type annotation. It's only now that I've read this discussion that I understand what the problem was!

So even if the behavior doesn't change, the error raised could be a lot friendlier to beginners. Something like

expected Positional[Int] but got Array ([1, 2, 3]) (Array[Any]). Did you forget to type annotate your argument? See this link for why this is a problem.

OR: did you mean to write foo(@a where .all ~~ Int)?

That's obvs too extreme for an error message, but it at least guides the beginner into understanding why their code is wrong.

Leont commented 6 months ago

While I mostly agree with what @niner says above, I do wish

sub foo(Int @ids is copy) { ... }

would DWIM. AFAICT it could do so with no downsides, but maybe I'm missing something.

jubilatious1 commented 6 months ago

@niner wrote:

... A type constraint of Int @a means "I accept a Positional[Int]`. Or less accurate but hopefully easier to understand: "I accept an Array that can only hold Ints". It does not mean "I accept an array that only holds Ints". It is prescriptive rather then descriptive.

Ummm, confused here. Shouldn't that read Int @a means ... '"I accept atomic/primitive/scalar elements into this @a array, elements that can only contain Ints'. ?

AFAIK, the Int @a shorthand doesn't work for arrays-of-arrays, only arrays-of-scalars. Going back to @hwayne 's comment (direct quote):

I've made this exact mistake of passing a literal array into a Int @a method.

Cheers.

lizmat commented 6 months ago

Re @Leont 's idea of: sub foo(Int @ids is copy) { ... }. I like it. From an implementation point it would only need to disable typechecking if the is copy trait is specified on a Positional in a signature. And then let the underlying my Int @ids = 1,2,3 do the work and the typechecking.

vrurg commented 6 months ago

I'd just wonder what the advantage of [1, 2, 3] as Array[Int] over Array[Int](1, 2, 3) is.

The advantage is in producing compile-time typed array. The Type(...) syntax is somewhat overloaded and sometime ambiguous. Type could happen to be a routine, for example. as would guarantee for the compiler that what's on its RHS is always a type thus simplifying optimization.

alabamenhu commented 6 months ago

So that leaves us with manual coercion. This does away with all the magic and its side-effects. It's explicit, predictable and as @alabamenhu showed rather easy to implement even. I'd just wonder what the advantage of [1, 2, 3] as Array[Int] over Array[Int](1, 2, 3) is.

For simple ones no advantage. But if we could make it so that the as handles nestled types, which I don't think the postfix parens handles. ((1,2),(3,4)) as Array[Array[Int]] would work, but Array[Array[Int]]((1,2),(3,4)) does not:

sink Array[Array[Int]]((1,2),(3,4)); 
# Type check failed in assignment to ; expected Array[Int] but got List ($(1, 2))

There are also two potential wins for readibility. First, the type(vals) above got a long sequence of symbols which the val as type manages to break up visually. As well, we do a lot of things in Raku with postfixes after words because it puts core and more important information first. With die if foo really highlights the potential for an error at that location, but if foo { die } or foo and die put more emphasis on the condition. Effectively identical, but with differences when trying to read and grock programmer intent. What do I intend to do with Array[Int](...);? That syntax doesn't jump out that I'm trying to do a coercion like perhaps (1,2,3).Array[Int] (if that were valid syntax) or (1,2,3) as Array[Int] does.

Also, see vrurg who posted between me starting this and posting this.

gfldex commented 6 months ago

Actually, even better than this, this can be done today, in a module.


multi sub infix:<as>(Positional \source, Positional:U \target) {
...

I was thinking the same. Having it in CORE as a macro-ish would allow the creation be moved to CHECK time. The semantice would be like:

CHECK my Int @a = Array[Int].new: 1,2,3;

I believe we should favour problems with literals to be compile time errors. After all, somebody could sneak class Array is export { } into a module (only on April 1th, ofc).

landyacht commented 6 months ago

A type constraint of Int @a means "I accept a Positional[Int]`. Or less accurate but hopefully easier to understand: "I accept an Array that can only hold Ints". It does not mean "I accept an array that only holds Ints". It is prescriptive rather then descriptive.

Why should it be that way, though? Programmers use these constraints to ensure that certain behaviors will be defined on the object(s) in certain ways—in other words, to get guarantees. If what the user really wants is truly, exactly Array[Foo], Hash[Foo], etc., then IMO Something[Foo] $scalar-contained would be far clearer and more idiomatic.

If we change the type constraint to be descriptive, i.e. it will accept any arrays that hold values that fit the element-type-constraint then the following code would continue without any errors:

foo([1, 2, 3]);
sub foo(Int @int-array) {
    @int-array[0] = "I'm definitely not an Int!";
}

This would "work" because we told the compiler that the function should check that the input array only holds Ints. It does, so the array is accepted. However, the array is not actually typed. It's still a generic array that can hold anything! How's that for a WAT?

Ok, so what about having the compiler check and convert the array automatically then? Changing the array type would fix this WAT. But then what about this code?

my @generic-array = 1, 2, 3; # This definitely can hold anything.
sub foo(Int @int-array) {
}
foo(@generic-array);
@generic-array.push: "harmless string"; # Boom! What? Why can't I put arbitrary things into my generic array anymore?

Call me old fashioned but this doesn't look any better to me. An object's type should not change like that (yes, we have mixins, but those don't change the base type). This would be very surprising and cumbersome to use. But at least we can fix this. After all we can simply create a new Array[Int]-object that's just initialized with the values passed to the function, right? Then the type change would only be effective within the function itself (as no type changes, we have a new object). Except, if arrays are always copied like that, how would you implement something like push @a, 1? Changes to the array would suddenly no longer be visible outside that function.

I agree that the two samples are both major WATs and are not acceptable ways to solve this problem. However, what's stopping us from making it Just Work™? In other words, don't change the type constraint on the caller-side container, but do restrict it within the (dynamic) scope of the called subroutine. Perhaps a new transparent wrapper object that almost always delegates to the passed-in object but adds type checks/guarantees? The only issue I can see would arise when the caller side is more restrictive than the callee, but this is already a problem in the current state of Raku and thus would not be a regression:

my Int @ints = 1, 2, 3;
sub foo(Numeric @nums) {
    @nums.push: 1e0;
}
foo @ints; # argument accepted because Int ~~ Numeric, yet the push will fail (true now too)

In the case where the caller passed in an unconstrained object not assigned into any variable, mutating the thing wouldn't matter anyway because the callee has the only reference to it. If a variable is being used instead, then one of the following is true:

  1. The type constraints are exactly the same: presents no issues now nor after the proposed change.
  2. The callee type constraint is more specific and the elements happen to match: presents an issue now (the compiler/runtime won't accept the call) but would not after the proposed change since nothing violating the caller's type constraint can be inserted. The caller's constraint will not be altered.
  3. The callee type constraint is less specific: presents a potential issue with mutation that is currently revealed with an exception whose message could be improved as part of the proposed change.
librasteve commented 6 months ago

Way up in this thread, I suggested:

I am now drawn to [1,2,3].typed as a synonym for Array*.are [pseudocode]

In the light of the many good comments since then, I would adjust this suggestion to be:

[1,2,3] as Int as shorthand for Array[Int](1,2,3)

Unlike the above proposals around the as keyword, this one differs in that (when applied to a Positional) it does not require 'Array[...]' to be specified. So it better echoes the capture syntax, like this:

sub foo(Int @a) {dd @a}
foo( [1,2,3] as Int );                  #Array[Int].new(1, 2, 3)

PS. While I have also upvoted an is vouched trait, I guess that the drift is away from messing with the Capture behaviour.

FCO commented 6 months ago

WhT would that do for List? Break? I mean (1, 2, 3) as Int

landyacht commented 6 months ago

Quoting my own comment here:

Perhaps a new transparent wrapper object that almost always delegates to the passed-in object but adds type checks/guarantees?

Another possibility is a low-level operation that allows changing (and then restoring upon scope exit) the type parameter of Positional, Associative, etc. This would, of course, only happen after element-checking (or cache-checking in the optimized case) if it's being changed to something more restrictive. Anyway, this is getting into implementation details, but I don't see why it would be impossible technically speaking, and I do think it would align far better with user expectations regardless of exactly how it's implemented. Such a change shouldn't break anything too bad, although it will technically be backward incompatible where users rely on it working exactly how it does now (which seems unlikely):

multi foo(@unconstrained) { 'User wants this candidate' }
multi foo(Bar @constrained) { 'Not this candidate' }
foo [Bar.new]; # called candidate would be different under the proposed change
raiph commented 6 months ago

@lizmat

I recall you suggesting something like [1,2,3] :Int as a way to specify a literal's element type, but that being rejected. Do you recall that suggestion, and why you or someone else ended up rejecting it?

lizmat commented 6 months ago

That syntax could be made to work:

$ raku -e 'multi circumfix:<[ ]>(*@a, Bool:D :$Int!){ my Int @ = @a }; dd [1,2,3]:Int'
Int @ = Array[Int].new(1, 2, 3)

More generally allowing this for any type, would be slightly more complicated, but still doable.

I'm not sure who suggested this syntax. Nor do I recall if if was rejected, and if it was, for what reason.

niner commented 6 months ago

Another possibility is a low-level operation that allows changing (and then restoring upon scope exit) the type parameter of Positional, Associative, etc. This would, of course, only happen after element-checking (or cache-checking in the optimized case) if it's being changed to something more restrictive. Anyway, this is getting into implementation details, but I don't see why it would be impossible technically speaking, and I do think it would align far better with user expectations regardless of exactly how it's implemented. Such a change shouldn't break anything too bad, although it will technically be backward incompatible where users rely on it working exactly how it does now (which seems unlikely):

How would that work with multi-threading when a different thread accesses the same array but is not in the same scope? Or co-routines which might do the same?

jubilatious1 commented 6 months ago
~ % raku
Welcome to Rakudo™ v2023.05.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2023.05.

To exit type 'exit' or '^D'
[0] > my @a[Int]
Ignoring [Int] as shape specification, did you mean 'my Int @foo' ?
  in block <unit> at <unknown file> line 1

[0] > my Int @a[]
Cannot create a  dimension array with only 0 dimensions
  in block <unit> at <unknown file> line 1

[0] > my Int @a;
[]

Have 'shaped' arrays have implemented yet? Is this issue a consequence of the fact that Type-specification internal to the [ ... ] square brackets has been reserved for 'shape' specification? Thx.

raiph commented 6 months ago

@dontlaugh

I am actually pretty n00b at using Raku's type constraints, but I've spent most of my life in statically typed languages like Go and Rust. I have experienced ... "discouragement" first hand

The consensus that's developed around comments by several core devs focuses on coercion. I'm with that consensus, and seek to build on it. As part of that, I think your examples are excellent, and show up weaknesses in both the cultural information space (you could in theory have known about the options that I show but perhaps didn't) and the coercion capabilities we currently have (which are good enough in some ways, certainly for your examples, but are missing some polish) that I'd love to see discussed.

I want to be able to start with something untyped and change the function signature later on to add constraints. Here is a contrived TypeScript example. ... I don't want to change the call sites. There could be a lot of those.

I've translated your example to Raku:

say sumTheNums([1, 2, 3, 4]);          # 10
say sumTheNums2([1, 2, 3, 4]);         # 10
say sumTheNums3(['1', '2', '3', '4']); # 10

subset NumericOrStringy where Numeric | Stringy;
sub sumTheNums(                           \nums) { sum nums }
sub sumTheNums2(Array[Numeric]()          \nums) { sumTheNums(nums) }
sub sumTheNums3(Array[NumericOrStringy]() \nums) { sumTheNums(nums) }

The Array[Numeric]() is a coercion type that coerces the passed array to an Array[Numeric]. Imo that's fine as is. Things get thornier if an @var is used, but I'm going to skip discussion of that for this first comment about your examples.

To match the semantics for the final signature I had to create a subset.

alabamenhu commented 6 months ago

WhT would that do for List? Break? I mean (1, 2, 3) as Int

foo (@bar) { 
    say @bar 
}

foo (1, 2, 3) as Int;
# Calling foo(Int) will never work with declared signature (@a)

Why? Because (1,2,3) coerces to Int providing the value 3. 3 is a scalar/Int, not a Positional, so the argument doesn't match the parameter.

raiph commented 6 months ago

@niner wrote:

I'd just wonder what the advantage of [1, 2, 3] as Array[Int] over Array[Int](1, 2, 3) is.

@vrurg replied:

[an] advantage is in producing compile-time typed array. The Type(...) syntax is somewhat overloaded and sometime ambiguous. Type could happen to be a routine, for example. as would guarantee for the compiler that what's on its RHS is always a type thus simplifying optimization.

@alabamenhu replied:

For simple ones no advantage. But [what] if we could make it so that [this works]?:

[[1,2],[3,4]] as Array[Array[Int]]

There are also two potential wins for readibility [to do with postfix syntax].


I recall someone suggesting of, but it seems that's from my research of prior discussions, not from this one. Anyhow, I suggest a tweak in thinking about the above to:

[[1,2],[3,4]] of Array[Int]

In other words: