Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

[feature] Types in Perl #17894

Open Ovid opened 3 years ago

Ovid commented 3 years ago

This will be a somewhat contentious issue and I'll be a bit pedantic at times for those reading this ticket but don't understand all of the issues involved (that includes myself). In particular, I'm going to give a long, rambling justification which I'm sure P5P doesn't need, but it's here to give background to everyone else reading this.

TL;DR

We need to to standardize our type syntax and semantics.

(And yes, TL;DRs need to be at the top of documents, not the bottom)

Typed Signatures

Dave Mitchell has been doing awesome work on subroutine signatures, something that is long overdue in the language (I honestly expected them as part of the Perl 6 project back in 2000).

Part of his proposal deals with types in signatures. The proposal is impressive and, from the synopsis, we have this:

sub f(
        $self isa Foo::Bar,         # croak unless $self->isa('Foo::Bar');
        $foo  isa Foo::Bar?,        # croak unless undef or of that class
        $a!,                        # croak unless $a is defined
        $b    is  Int,              # croak if $b not int-like
        $c    is  Int?,             # croak unless undefined or int-like
        $d    is PositiveInt,       # user-defined type
        $e    is Int where $_ >= 1, # multiple constraints
        $f    is \@,                # croak unless  array ref
        $aref as ref ? $_ : [ $_ ]  # coercions: maybe modify the param
) { ...};

Interestingly, the very first response starts with this:

Yuck. This is a huge amount of new syntax to add. The new syntax doesn't pull its weight, given that it can only be used in this one context. If you're adding a bunch of syntax for type constraints, it should also be available for type checking purposes outside signatures.

There are a number of interesting comments about the proposal, but I want to focus on my primary concern: "type checking purposes outside signatures".

Long, Rambling Justification

When I work on large systems, one of the most frequent bugs I encounter is when data X is passed from foo() to bar() to baz() to quux() and while quux() was expecting an integer, it received something that was not an integer.

If I'm very lucky, the code dies a horrible death and I have to walk back through the call chain to figure out exactly where the bad data originated and frankly, I'd rather rip out my intestines with a fork than have to do that again.

If I'm really unlucky, however, the code doesn't die. Instead, it just silently gives terribly bad, wrong, no good rubbish. And no warning at all that something has gone wrong.

$ perl -Mstrict -Mwarnings -E 'say [] + 1'
140473498905217

Oh, that's not good. So let me validate my argument with a regex!

$ perl -Mstrict -Mwarnings -E 'say [] =~ /\d/ ? "Kill me now" : "Whew!"'
Kill me now

Ah, so I need to be more careful.

perl -Mstrict -Mwarnings -E 'my $d = []; say defined $d && !ref $d && $d =~ /\d/ ? "Kill me now" : "Whew!"'
Whew!

OK, that's better. Finally I'm safe.

perl -CAS -Mstrict -Mwarnings -E 'my $d = chr(43270); say defined $d && !ref $d && $d =~ /\d/ ? "Kill me now: $d " : "Whew!"'
Kill me now: ꤆

I don't even know what is (Google tells me it's part of the Paris metro line, but I'm a wee bit skeptical on that), but I know I forgot the /a switch on my regex. And I'll bet most casual Perl developers don't know about the /a switch and I know for a fact that most large systems don't try to validate their types because it's a pain, it's more grunt work, and it's fraught with error. Or as I like to say "It'̸s ̴a pai̶n, ͝it͏'͠s͢ ̛m͞o҉ré grun͝t̕ ̴w̛ork, a̸nd ̧it's͘ f́ra̛ught wi̧th ̴er̕r̴o̷r̵."

So I applaud David's work, but then there's Cor.

Cor Types

As many of you know, Cor is intended to be the new object system for the Perl core. If you don't want to wade through the wiki, you can watch this talk I gave on Cor.

One thing I briefly touched on and didn't get in to, is typing. So, here's a pointless Python Point class to illuminate this point:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def inverse(self):
        return Point(self.y,self.x)

point = Point(7,3.2).inverse()
print(point.x)
point.x = "foo"
print(point.x)

As you can see, at the end, I set x to the string "foo". How can I prevent that when working on a million-line code base? Well, according to Pythonistas, it's "unpythonic" to validate your arguments. Even Perl developers, largely via the Moo/se family of OO, seem to have grudgingly admitted that yeah, asserting your types isn't such a bad thing.

So while Dave Mitchell's been thinking about types in signatures, I've been thinking about them in Cor. Here's the above point class, with almost identical behavior:

class Point {
    has ($x, $y) :reader :writer :new :isa(Num);

    method inverse() {
        return Point->new( x => $y, y => $x );
    }
}

Except in the Cor world, calling $point->x("foo") would generate a runtime error, just as it would with Moo/se code. In the above, we have "slots" (instance data) declared with has. The attributes merely provide sugar for common things we need in OO systems. Thus, :isa(Num) provides my run time type checking.

And that brings me to the next problem.

SYNTAX

Traditionally, we tend to see types defined in front of the variables, such as declaring an integer in C: int c. In Dave's proposal, it's after the variable: $c is Int. In Cor, we have has $c :isa(Int). Of course, there's also this loveliness:

$ perl -E 'package Int {} my Int $c = "Not an Int"; say $c'
Not an Int

But that syntax has been with us for years and is largely ignored and if we want to attach additional semantics to the type (e.g., coercions), having it after the variable instead of before is probably a good idea.

In short, optional typing in Perl is long overdue, it's planned for signatures, it's planned for Cor, and eventually someone will want to write:

package Foo {
    my $c :isa(Int) = 3;
    # or
    my $c is Int = 3;
    # or
    ...
}

But if we have types, we desperately need to ensure that "there's more than one way to do it" doesn't apply (sorry, Perlers!). Because if signatures use one type syntax, Cor uses another, and regular variables possibly use one or the other (heaven forbid we get a third!), then it's going to be a confusing mess that frankly, I don't want to try to deal with.

And then there's this bit from Dave's proposal:

$e is Int where $_ >= 1, # multiple constraints

I quite like that, but I'm unsure how I would fit the constraints into Cor's syntax. That being said, Cor provides additional behavior to class slots via attributes and it might be a touch disappointing to have an exception here (but I'd live with it).

SEMANTICS

Syntax is nice, but the meaning of the syntax is important. For example, I think we can agree that for a type system, an integer shouldn't match , even if /d/ does. But what does my $c :isa(Int) = -7/3; produce?

#include <stdio.h>
int main() {
   int c = -7/3;
   printf("%d",c);
   return 0;
}

The above C code compiles without a warning and prints -2. Perl, historically, doesn't do the "integer math" stuff and tries to avoid throwing away information:

$ perl -E 'say -7/3'
-2.33333333333333

This is in sharp contrast to other dynamic languages which often get this spectacularly wrong:

$ ruby -e 'puts -7/3'
-3

So, what does Perl do with my $c :isa(Int) = -7/3;? Should it just throw away the extra data? Should it be an error? Should it be a warning? Should the type be ignored?

And I'm not even going to try to figure out a type hierarchy right now, but an Int is a Num while the reverse isn't true. However, we'll need one well-defined and standardized, along with an extension mechanism.

leonerd commented 3 years ago

I definitely feel that something should exist, and it should have a common syntax feel between signature args, regular variables, object slots, and anywhere else we want to put it.

As to the specific spelling I might prefer is over isa, at least if we're intending to have type constraints that aren't exactly equivalent to the new isa check operator.

my $num :is(Num);
sub add($c :is(Num), $d :is(Num)) { ... };
class Point {
    has ($x, $y) :is(Num);
    method jump ($dX :is(Num), $dY :is(Num)) { ... };
}

It would be nice to keep the neatness that the :isa type constraint is related to the isa test. By which I mean, that if you constrain a variable by :isa(ClassName) I would expect that to exactly correspond to allowing values for which $value isa ClassName is true. The reason that Dave M's idea involved both is and isa was to allow a new kind of "thing" of a type constraint, which can now allow new kinds of conditions that aren't just equivalent to isa class instance testing.

Leont commented 3 years ago

What about?

my $c :isa(Int) =1
sub foo {
    $_[0] = "HERE";
}
foo($c)

Input checking is trivial to do correctly, but referential correctness is a PITA.

Not in the last place because there really is only one scalar type, and all other types are subtypes of it (in the raku sense of the word)

atoomic commented 3 years ago

my 2cts: being able to set a type constraint in a function signature seems a big win to me

djzort commented 3 years ago

Cperl has implemented typing as described http://perl11.org/cperl/#coretypes:-Int-UInt-Num-Str and http://perl11.org/cperl/perltypes.html#native-types It's in code already. It's not vaporware.

The is and as operators are in Hack (Facebook's new direction on PHP) https://docs.hhvm.com/hack/types/introduction Which isnt a reason to adopt them but just offered as an observation. How they have introduced soft types hints is to me very interesting even if its ugly syntactically. They have used it to introduce more and more typing in to a giant codebase of was-php.

Leont commented 3 years ago

my 2cts: being able to set a type constraint in a function signature seems a big win to me

I'm not disagreeing. I am saying that there's a big difference between "checking types at input" and checking types on every assignment"

wbraswell commented 3 years ago

TL; DR RPerl has a real type system. We want to work directly with @Ovid & P5P to standardize the Perl type system. We can add type name aliases to help match type names between RPerl & others.

[[[ RPERL DATA TYPES ]]]

RPerl has a relatively mature strong type system, which uses the (generally unknown) slot in the Perl grammar which occurs between the my keyword and the $foo or @foo or %foo variable names.

Example code:

my integer $foo = -23;
my number $bar = 23.456_789;
my string $baz = 'howdy!';

my integer_array @foo_array = ( 1, -2, 3 );
my integer_arrayref $foo_arrayref = [ 4, 5, -6 ];

my string_hash %baz_hash = ( a => "howdy", b => "hello", c => "hola" );
my string_hashref $baz_hashref = { e => "hey", f => "hi", g => "hiya" };

The RPerl type system already works in all 3 primary compilation modes:

PERLOPS_PERLTYPES CPPOPS_PERLTYPES CPPOPS_CPPTYPES

We refer to the RPerl type system as "real" data types, because they are not simulated soft types which can be freely intermixed without consequence. RPerl types are real C89 and C++ data types which have all the same properties and capabilities and constraints as ordinary C(++) code.

For the "PERLTYPES" modes, RPerl uses the existing IV/NV/PV/AV/HV types from the perl.h C89 source code.

For the "CPPTYPES" mode, RPerl uses real C++ data types. There are "core" types which match the PERLTYPES listed above:

INTEGER aka IV: typedef long long integer; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/Integer.h#L16

NUMBER aka NV: typedef double number; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/Number.h#L16

STRING aka PV: typedef std::string string; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/String.h#L9

ARRAY aka AV: typedef std::vector<integer> integer_arrayref; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataStructure/Array/SubTypes1D.h#L18

HASH aka HV: typedef std::unordered_map<string, integer> integer_hashref; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataStructure/Hash/SubTypes1D.h#L18

(I am currently in the process of implementing array-by-value and hash-by-value, so the AV & HV typedefs will probably change a bit in the near future.)

In addition to the above "core" types, RPerl also implements:

BOOLEAN: typedef bool boolean; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/Boolean.h#L11

UNSIGNED_INTEGER: typedef unsigned long int unsigned_integer; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/UnsignedInteger.h#L9

GMP_INTEGER: typedef mpz_t gmp_integer; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/GMPInteger.h#L14 (This type will probably be split out into an optional RPerl library in the future.)

CHARACTER: typedef char character; https://github.com/wbraswell/rperl/blob/master/lib/RPerl/DataType/Character.h#L11

Example code:

my boolean $quux = 0;
my unsigned_integer $qaax = 23;
my gmp_integer $qiix  = gmp_integer->new();  gmp_init_set_unsigned_integer( $qiix, 1 );
my character $qoox = 'a';

The RPerl data types have several features which already work in all 3 compilation modes, such as runtime type checking in PERLTYPES modes which behaves exactly the same way as the compile-time type checking in CPPTYPES mode. As part of my current work on implementing array-by-value and hash-by-value, I am adding C++ template features to the RPerl type system, which should significantly simplify the CPPTYPES data structures for arrays and hashes.

We can always add more type name aliases such as Int for integer, Str for string, etc. This would allow RPerl to accept Moose-style type names.

For more info, some documentation does already exist: http://rperl.org/learning/#CHAPTER_2%3A_SCALAR_VALUES_%26_VARIABLES_(NUMBERS_%26_TEXT)

[[[ RPERL SUBROUTINE ARGUMENTS & RETURN TYPES ]]]

All RPerl subroutines require data types to be specified for all input arguments via the @ARG array, as well as the return value via the special $RETURN_TYPE string.

Example code:

sub pies_are_round {
    { my void $RETURN_TYPE };
    print 'in pies_are_round(), having PIE() = ', PIE(), "\n";
    return;
}

sub pi_r_squared {
    { my number $RETURN_TYPE };
    ( my number $r ) = @ARG;
    my number $area = PI() * $r ** 2;
    print 'in pi_r_squared(), have $area = PI() * $r ** 2 = ', $area, "\n";
    return $area;
}

sub garply {
    { my number_arrayref $RETURN_TYPE };
    ( my integer $garply_input, my number_arrayref $garply_array ) = @ARG;
    my integer $garply_input_size = scalar @{$garply_array};
    my integer $ungarply_size_typed = scalar @{my integer_arrayref $TYPED_ungarply = [4, 6, 8, 10]};
    my number_arrayref $garply_output = [
        $garply_input * $garply_array->[0],
        $garply_input * $garply_array->[1],
        $garply_input * $garply_array->[2]
    ];
    return $garply_output;
}

sub gorce {
    { my string_hashref $RETURN_TYPE };
    ( my integer $al, my number $be, my string $ga, my string_hashref $de) = @ARG;
    return {
        alpha => integer_to_string($al),
        beta  => number_to_string($be),
        gamma => $ga,
        delta => %{$de}
    };
}

[[[ RPERL CLASS TYPES ]]]

RPerl uses both slots in the Perl grammar for class names, because variable declaration and class constructor call can be split apart into 2 separate statements

Example code, 1 statement: my Some::Class $some_object = Some::Class->new();

Example code, 2 statements:

my Some::Class $some_object;
# ... possibly other code here ...
 $some_object = Some::Class->new();

While it may seem redundant to use both slots in the 1-statement example, we can easily see from the 2-statement example that we would have to rely on type inference if we did not use the first grammar slot, which could be added in future versions of RPerl.

[[[ RPERL METHOD ARGUMENTS & RETURN TYPES ]]]

Methods are very similar to subroutines, with the addition of the automatically-generated ::method type suffix.

Example code:

sub quux {
    { my void::method $RETURN_TYPE };
    ( my object $self) = @ARG;
    $self->{plugh} = $self->{plugh} * 2;
    return;
}

sub quince {
    { my integer::method $RETURN_TYPE };
    my string $quince_def
        = '...Cydonia vulgaris ... Cydonia, a city in Crete ... [1913 Webster]';
    print $quince_def;
    return (length $quince_def);
};

sub qorge {
    { my string_hashref::method $RETURN_TYPE };
    ( my object $self, my integer $qorge_input ) = @ARG;
    return {
        a => $self->{xyzzy} x $qorge_input,
        b => 'howdy',
        c => q{-23.42}
    };
}

sub qaft {
    { my RPerl::CompileUnit::Module::Class::Template_arrayref::method $RETURN_TYPE };
    ( my object $self, my integer $foo, my number $bar, my string $bat, my string_hashref $baz ) = @ARG;
    my RPerl::CompileUnit::Module::Class::Template_arrayref $retval = [];
    $retval->[0] = RPerl::CompileUnit::Module::Class::Template->new();
    $retval->[0]->{xyzzy} = 'larry';  # saint or stooge?
    $retval->[1] = RPerl::CompileUnit::Module::Class::Template->new();
    $retval->[1]->{xyzzy} = 'curly';
    $retval->[2] = RPerl::CompileUnit::Module::Class::Template->new();
    $retval->[2]->{xyzzy} = 'moe';
    return $retval;
}

In the final subroutine qaft() above, we can also see how a method can return an array of objects of the class RPerl::CompileUnit::Module::Class::Template:

    { my RPerl::CompileUnit::Module::Class::Template_arrayref::method $RETURN_TYPE };

[[[ RPERL CLASS PROPERTIES ]]]

RPerl currently uses a homegrown OO system, which will hopefully soon be removed & replaced by Ovid's Cor OO syntax.

Example code:

our hashref $properties = {
    plugh => my integer $TYPED_plugh         = 23,
    xyzzy => my string $TYPED_xyzzy          = 'twenty-three',
    thub  => my integer_arrayref $TYPED_thub = undef,                    # no  initial size, no  initial values 
    thud  => my integer_arrayref $TYPED_thud = [ 2, 4, 6, 8 ],           # no  initial size, yes initial values
    thuj  => my integer_arrayref $TYPED_thuj->[4 - 1] = undef,           # yes initial size, no  initial values
    yyx   => my number_hashref $TYPED_yyx = undef,
};

As you can see, we have redundant variable names in this syntax, such as plugh and $TYPED_plugh, which is necessary in order to create the Perl grammar slot we need for each property's type name. This is very awkward, and should go away completely when we upgrade to the Cor OO syntax.

tommybutler commented 3 years ago

Wow so many great ideas! I sure hope we can leverage this momentum and find a way to work together. This... This is fantastic! 😊

duncand commented 3 years ago

I mainly care that type handling is consistent in all contexts. Regular my variables, routine parameters both positional and named, object attributes, and everything else, the syntax and behaviour and feature set should be exactly the same. What the actual syntax is matter less.

In contrast to most modern languages, Raku complicates the matter somewhat due to having not just $ but also @/%/etc so when you have an @array say there's a difference between Foo @bar and @bar is Foo because the former describes the type of each element while the latter describes the type of the whole array, or at least this what I recall from years ago.

A big question is whether or how Perl wants to follow Raku's example, typing per whole container or per element or both, or whether Perl wants to be more like Java or most languages where the declared type is about the container as a whole; I prefer the latter personally even if it is somewhat more verbose.

duncand commented 3 years ago

Actually if it were up to me, my preference is that type signatures are always AFTER the described thing, similar to how it is in SQL, eg <varname> <type> is preferred over <type> <varname>. One key reason for this is consistency between name-type pairs and name-value pairs, in that the name always appears first, that is what I prefer.

willt commented 3 years ago

As far as int truncation/rounding behavior goes. python does the same thing as ruby and rounds up(??)

>>> -7/3
-3

php gives you a float unless you cast to int

<?php
  print -7/3;
  print (int)(-7/3);
?>
-2.3333333333333
-2

c# truncates like c

using System;

public class Program
{
    public static void Main()
    {
        Console.WriteLine(-7/3);
    }
}
-2

In this case of Cor my $c :isa(Int) = -7/3; since you are putting it in an Int I think it should truncate. If you were storing it in something that can handle the full value (float, string, untyped/dynamic) then it shouldn't truncate.

Ovid commented 3 years ago

@willt wrote:

In this case of Cor my $c :isa(Int) = -7/3; since you are putting it in an Int I think it should truncate. If you were storing it in something that can handle the full value (float, string, untyped/dynamic) then it shouldn't truncate.

And that starts off an interesting discussion. I posted the Ruby example (and have a few others), to show the problem and how easy this stuff is to get wrong. You don't truncate, you round. For example, we know what -7/3 is actually -2 1/3rd, but we can't represent that, so we use our float equivalent of -2.33333333333333 (I wish we had a native decimal type). Truncating happens to work here, but what if it was -14/3? The answer is -4.66666666666667, but truncating gets us -4 instead of of an (arguably) more correct -5.

But truncating is very fast and how many people actually hit this problem? Well, the C89 standard decided to punt on this entirely when they wrote:

when the division is inexact and one of the operands is negative, the result may be rounded up or rounded down, in an implementation-defined manner. (The behavior of the % operator is required to be consistent with that of the / operator.) This differs from Fortran, which requires that the result must definitely be truncated toward zero. Fortran-like behavior is permitted by C89, but it is not required.

So if you had different C compiler, a correct C89 program in one might produce incorrect results in another! And then Fortran requires always truncating to zero! -16.9999999999999999 becomes -16!

We need to tread carefully here.

So if we have $c = -14/3; and the dev doesn't know that $c is of type Int, we should probably round (I think I agree with that), but do we ever provide a means to let the developer know they've lost information? After all, Perl currently tries hard to not throw away information. This behavioral difference could be very surprising.

Or maybe I'm just worrying about nothing.

xenu commented 3 years ago

Perl, historically, doesn't do the "integer math" stuff

Did you forget about use integer?

So if you had different C compiler, a correct C89 program in one might produce incorrect results in another! And then Fortran requires always truncating to zero! -16.9999999999999999 becomes -16!

This is a theoretical problem, virtually all C89 implementations truncate towards zero and C99 is requiring that.

leonerd commented 3 years ago

I would be verymuch against any type system that distinguished /actual/ platform numbers from "any kind of object that tries to act numbery" (as per Math::BigInt, BigRat, etc..); and likewise actual platform strings from "any kind of object that tries to act stringy" (as per String::Tagged, etc..).

It is already currently the case that a few core operators have some small bugs regarding this overloading, but overall those can be fixed. For the large part, you can use any numbery/string-like object as if it was a native platform version of that type. I am verymuch keen to ensure we keep that property.

Stated more explicitly: For whatever variation on syntax we come up with, I would expect this test to pass:

sub add(Num $x, Num $y) returns Num {
  return $x + $y;
}

isa_ok(add(Math::BigInt->new(1), Math::BigInt->new(2)), "Math::BigInt");
willt commented 3 years ago

I get the same results with c89 or c99. Interesting that the higher precision of "a" ends up rounding to 17. Same behavior happens with positive numbers as well.

#include <stdio.h>

int main ()
{
  int a = -16.9999999999999999;
  int b = -7 / 3;
  int c = -14 / 3;
  int d = -16.99999999999999;
  printf ("a: %i\nb: %i\nc: %i\nd: %i\n", a, b, c, d);
  return 0;
}
$gcc -std=c99 -o main main.c
$./main
a: -17
b: -2
c: -4
d: -16
daotoad commented 3 years ago

As far as coercion on initialization goes, I’d say, avoid it.

If you want coercion, one value or the other needs to provide a path.

my $i :isa(Int) = 3/4;  # should die
my $i :isa(Int) :from(Number, sub { Int shift } )= 3/4;  # Lvalue coerces 
my $f :to(Int, sub {Int shift}) = 3/4;
my $i :isa(Int) = $f; # rvalue coerces

Being able to add constraints with something like Raku’s where would be awesome—it’s one of my favorite bits of Raku.

my $i :isa(Int) :where(sub {shift > 0 }) = 3;

Any discussion of this also needs to include syntax for declaring and specifying types.

dur-randir commented 3 years ago

We need to to standardize our type syntax and semantics. Why do we need it? Perl has always been TMTOWTDI, so if anything can be done as a module first - it should be done as a module, and not brought straight into core.

Furthermore, what is the cost of this system to code that doesn't use it and doesn't need it? For example, 'tie' and 'taint' both have heavy impact on code that'd never support and care for them. And removing things is much harder than adding them - smartmatch is an excellent example of overcomplication here.

tonycoz commented 3 years ago

How deep does the type system go? Can we declare complex types? eg. array containing arrays containing integers, array of integers or undef entries.

Another option over declared types is type inference, eg. if a scalar is used like @$foo then $foo must be an array reference or an object overloading @{}.

leonerd commented 3 years ago

@tonycoz I would imagine if you can't declare deep type annotations like that the entire feature becomes so much less useful.

Whereas the moment you can declare those deep types, especially onto mutable storage (i.e. arrays and hashes) that can be passed by-reference into other functions, you hit all the exciting fun of covarant and contravariant type extension, which often causes troubles in other languages with these features.

Ovid commented 3 years ago

@tonycoz Did you know there's an old type inference module on the CPAN? Devel::TypeCheck. Unfortunately, it didn't really go anywhere and, sadly, the PDF it refers to (for explaining it) is not online.

As I recall, the work wasn't entirely successful because we could infer the type of [1,2,3], but [1,2,"3"] is ambiguous and [1,2,{ three => 3 }] is a trainwreck.

Also, type inference comes far later than types.

leonerd commented 3 years ago

Ohyeah, Perl lacks a "tuple" type as distinct from a regular array; so inference gets very confused about that. Most uses of arrays in perl end up being homogeneous data storage (the "type" of each element should be considered the same), but often when people use an array as a cheap tuple/struct thing they are heterogeneous, and the type inference will get confused.

I have tried to alieviate the problem by providing modules like Struct::Dumb which folks can use instead of heterogeneous array-as-tuple, but that hasn't had as much uptake as I'd have liked. :(

perigrin commented 3 years ago

So I just got my new job in a shop with a lot of legacy Perl. I've never worked in Perl before but I have a lot of experience with Go, TypeScript and Ruby. What happens when I update sub comp { my ($x, $y) = @_; $x cmp $y } to sub comp (Int $x, Int $y) Bool { $x cmp $y }? Do I get one error or two?

Ovid commented 3 years ago

I've added some more thoughts about Type Systems to the Cor wiki.

tommybutler commented 3 years ago

I've added some more thoughts about Type Systems to the Cor wiki.

@Ovid great write up! Thank you! My thoughts in response (nobody needs to agree) are that...

wbraswell commented 3 years ago

@Ovid I have spent many years working on a real type system for the Perl compiler, as detailed in my comment above: https://github.com/Perl/perl5/issues/17894#issuecomment-650266071

How can we best work together to integrate your new ideas with the existing type system in the Perl compiler?

Ovid commented 3 years ago

@wbraswell One issue with things like integer_array and integer_arrayref is that they are not composable. Thus, the system seems to need to predefine all types that people might want in advance. With work done in Perl, we have things like ArrayRef[Int] which is composable, but possibly can't be optimized in the way that your work does.

When in doubt, I defer to optimizing developer needs rather than computer needs. Thus, if someone wants to write HashRef[ArrayRef[Int]], they should be able to do so now rather than wait for compiler support.

tommybutler commented 3 years ago

@Ovid we've waited an awful long time for a type system to just get it wrong because we're in a hurry. Programmer needs and computer needs aren't mutually exclusive, particularly in the era of data science when performance matters more than almost anything else except for the right answer. We reduce the carbon footprint of computing when we make more efficient programming languages that execute more quickly and reduce the expense involved in training models and in deploying them and maintaining them in production. The blessing of a type system is not just what it provides in terms of strictures, but also in performance. I can't emphasize this enough as someone who is actively working in this industry.

Enterprise businesses, educational institutions, governments, and human beings need performance... now possibly more than ever. If a feature is missing from the compiler, let's add that feature, and let's make sure that features are extensible so that if something is necessary outside of the compiler, that we can implement it in Perl space -- another pair of concepts that don't need to be mutually exclusive either. Why would we design a system that can't be extended? (Edit - I don't think we are, But I'm actively advocating for the continuance of the long tradition of extensibility that Perl enjoys, enabling people to try things out in Perl/runtime space before stealing good ideas and optimizing them away into the compiler for better performance where it makes good sense.)

wbraswell commented 3 years ago

@Ovid Yes the latest release of RPerl v7.0 (on the 4th of July) now includes basic "dynamic dispatch" capabilities for all 3 compile modes:

PERLOPS_PERLTYPES
CPPOPS_PERLTYPES
CPPOPS_CPPTYPES

The use of dynamic dispatch is part of our ongoing effort to enabling composable multi-dimensional data structures in the Perl compiler. Future releases of RPerl will continue to build upon this until we have fully composable arbitrary data structures.

There are probably multiple options for how to merge our somewhat different approaches to type definitions...

One option is that integer_arrayref_hashref can be automatically mapped to (and from) HashRef[ArrayRef[Int]] at preprocessor time.

Alternatively, if there is overwhelming technical reasoning to permanently abandon the integer_arrayref_hashref syntax, then I am open to that possibility as well.

Another important consideration is the generation of optimized code for nested data structures, which can likely be achieved by the use of C++ templates and dynamic dispatch, both of which are now demonstrated in the new RPerl v7.0 compiler.

What about something like this?

my hashref{arrayref[integer]} $foo = { [0, 1, 2], [3, 4, 5], [6, 7, 8] };

tommybutler commented 3 years ago

@wbraswell reasonable folks will accept the reality that we can't solve the entirety of the world's problems in a single release, and that we should just start to head in the direction that we want to go, trusting one another that we will clean up our messes as we go. I don't advocate for the accumulation of technical debt, but rather the immediate payoff of the same in achievable increments. Let's start paying down the debt. Let's start marching forward toward the end goal. If it helps to have a roadmap completely laid out, let's do that, while remaining flexible on how and what we deliver. In the meantime, that we don't have all of the answers shouldn't prevent us from answering some.

Additionally, reasonable folks shouldn't have much of a gripe that the first release of any type system is not going to cover all use cases, and that as we see the need for doing things like arrayref[arrayref[num],arrayref[int,string],arrayref[string],arrayref[*]], that this becomes possible over time wherever it's not possible immediately.

Edit - syntax fix

wbraswell commented 3 years ago

@tommybutler

PHASE 1: We can start with homogeneous static data structures, which means the composable type definitions will contain no comma , characters: arrayref[ integer ] hashref{ number } arrayref[ arrayref[ number ] ] hashref{ arrayref[ integer ] } hashref{ arrayref[ arrayref[ string ] ] } hashref{ arrayref[ hashref{ hashref{ arrayref[ string ] } } ] }

PHASE 2: To handle dynamic data structures, we already have the scalar and object and unknown data type definitions: arrayref[ scalar ] # scalar type can contain integer or number or string etc arrayref[ object ] # object type can contain blessed object of any class arrayref[ unknown ] # unknown type could be scalar or object or array or arrayref or hash or hashref etc

PHASE 3: Later we can add heterogeneous static data structures, with comma , characters in the type definitions: arrayref[ hashref{ string }, arrayref[ hashref{ number }, hashref{ integer } ] ]

PHASE 4: Eventually we can arrive at heterogeneous dynamic data structures: arrayref[ hashref{ scalar }, arrayref[ hashref{ object }, hashref{ unknown } ] ]

@Ovid What do you think about this 4-phase approach as the basis for a roadmap moving forward?

duncand commented 3 years ago

@wbraswell To address one point, "unknown" is a very poorly descriptive name to refer to the maximal type. Every type is "unknown" to some degree or other. In fact we DO know what the maximal type is, we can put anything in it.

I propose the name "any" (short for "anything") in place of "unknown". Inspired by Raku and used by my own languages too.

duncand commented 3 years ago

@wbraswell @Ovid I agree 100% that having distinct system declarations like integer_arrayref or such are a horrible idea that sets up a combinatorial explosion of types or is completely arbitrary, and the only correct thing to provide by the system is the basic composable elements so one canonically writes things like arrayref[integer] instead. There is no reason that a Perl compiler can't simply look for patterns like arrayref[integer] and use a specialized implementation behind the scenes for it that they would have used for the conceptual integer_arrayref.

duncand commented 3 years ago

@wbraswell @Ovid I also believe the type system should NOT distinguish array and arrayref at least with the traditional meanings of their being the same thing but for a difference in symbol table syntax @foo = (1,2,3) vs $foo = [1,2,3]. I would not even use the multiple terms to differentiate say immutable vs mutable variants, but instead have very different terms between the 2 much as Raku does. Generally speaking I feel we want to defer to how Raku is designed on any aspects where it makes sense for Perl and it is an arbitrary choice for something. (Personally I strongly dislike Perl's @% name options and I use $ for all variable names in Perl no matter what they contain as being much more consistent.)

duncand commented 3 years ago

If Perl is intending to expand base features over time then I recommend that terms like "tuple" be reserved for that time when it is a new base type that isn't an array or hash and is fundamentally about being a heterogeneous collection rather than a homogeneous collection. I also feel that the meaning of 'tuple' should be analogous to a record or struct or anonymous object in that it is explicitly a collection of named attributes, but I can see the arguments for it being a positional collection instead, but either way each element has a specific meaning. So I mean don't use "tuple" now to mean a means of declaring an "array", which is conceptually distinct with each element having no distinct meaning.

duncand commented 3 years ago

@wbraswell @Ovid I feel that a fundamental thing we want to get right early on is defining exactly what an "integer" or a "number" etc are in the type system. I feel that it should be strict so that say [5,7,9] is considered integers, ["5","7","9"] is considered strings and NOT integers, and that [5,"7",9] is considered mixed or just scalars. It is critical that just because a string is coercible to a number that it is not treated AS a number when asking "what is this".

wbraswell commented 3 years ago

@duncand

UNKNOWN DATA TYPE: I am open to a community discussion on the naming of the unknown or any data type. You may understand the current choice of unknown when you look at the compiler source code: https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/rperltypes.pm#L344-L361

sub type_fast {
    { my string $RETURN_TYPE };
    ( my unknown $variable ) = @ARG;
    if ( not defined $variable ) { return 'unknown'; }
    if ( main::RPerl_SvNOKp($variable) ) { return 'number'; }
    elsif ( main::RPerl_SvIOKp($variable) ) { return 'integer'; }
    elsif ( main::RPerl_SvPOKp($variable) ) { return 'string'; }
    else { return 'unknown'; }
}

COMPOSABLE TYPE DEFINITIONS: Yes you can see my previous comments where I have proposed a 4-phase roadmap for creating composable data type definitions. https://github.com/Perl/perl5/issues/17894#issuecomment-657106014

ARRAY VS ARRAYREF: I also thought we could get away with treating arrays the same as arrayrefs, but it has proven over time to be semantically incorrect. First of all, arrays and array references are distinctly different Perl data structures with different behavior. Secondly, arrays and array references are (again) distinctly different C++ data structures with (again) different behavior. Third, there are Perl operations which require array data structures, and there are other Perl operations which require array references instead. Arrays and array references are both syntactically and semantically different, they can not be combined or treated as equivalent.

TUPLE VS OBJECT: What you are describing is an object. We don't have structs or tuples in Perl, we have objects. Ovid's new OO system "Corinna" AKA "Cor" addresses the handling of OO syntax and semantics. The Perl compiler generates C++ classes and objects in the back-end. Optimizing C++ compilers will optimize objects to be approximately equivalent to structs in both runtime performance and memory usage. For these reasons we should be fine with objects using Ovid's new OO syntax and compiled with the Perl compiler.

INTEGER VS NUMBER: This has been decided long ago by Perl internals. See the code example above for SvIOKp() and SvNOKp() and SvPOKp() which stand for "Scalar value Integer OKay" and "Scalar value Number OKay" and "Scalar value Pointer (to character string) OKay" respectively. In other words, they are Perl's internal tests for whether a scalar variable contains an integer or number or string value, respectively. These functions have their origin in perl.h or friends, and are used by RPerl to build the real Perl type system, as described in all my posts in this thread. Further, you may see the equivalent C++ type definitions which I have created as part of the Perl compiler: https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataType/Integer.h#L99 typedef long integer; https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataType/Number.h#L16 typedef double number; https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataType/String.h#L9 typedef std::string string;

So, to answer your question of "what is this?", please see the type_fast() Perl subroutine above, which has its equivalents in C++ here: https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataStructure/Array/SubTypes1D.tpp#L166-L191 https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataStructure/Array/SubTypes1D.tpp#L193-L223

type_enum type_fast_enum(SV* variable) {
    if (NOT_DEFINED(variable))    { return TYPE_unknown; }
    if      ( SvNOKp(variable) )  { return TYPE_number;  }
    else if ( SvIOKp(variable) )  { return TYPE_integer; }
    else if ( SvPOKp(variable) )  { return TYPE_string;  }

    else if ( number_arrayref_CHECK( variable, 1) )  // no_croak = 1
                                  { return TYPE_number_arrayref; }
    else if ( integer_arrayref_CHECK(variable, 1) )  // no_croak = 1
                                  { return TYPE_integer_arrayref; }
    else if ( string_arrayref_CHECK( variable, 1) )  // no_croak = 1
                                  { return TYPE_string_arrayref; }
    else if ( SvOK(variable) && 
              SvAROKp(variable) ) { return TYPE_arrayref; }

    else                          { return TYPE_unknown; }
}
duncand commented 3 years ago

@wbraswell Thank you for your quick responses. Regarding ARRAY vs ARRAYREF when I say treat them the same I mean that when talking about a composed type there is only one thing that can be an element of another. My understanding is that an ARRAY is strictly a top-level lexical concept with its own sigil and it can't be an element of something else. Is it not correct that in Perl today you can have an ARRAYREF as an element of an array or hash but there is no distinct concept of having an ARRAY as an element of an array or a hash? When I say treat the same I mean Perl doesn't support arrayref[array] and arrayref[arrayref] as distinct concepts, does it? If they are distinct, what are Perl expressions to yield the former vs the latter?

wbraswell commented 3 years ago

@duncand Yes you are correct, when composing nested data structures we must only use arrayref by reference and hashref by reference, not array by value or hash by value, which is due to Perl's semantics of automatically merging adjacent arrays (or hashes) when passed by value.

Still, we must maintain the independent syntax and semantics of array vs arrayref for use in non-nested (1-D) data structures: array( integer ) arrayref[ integer ]

Also, array by value and hash by value can both be used as the top-level (outer-most) type definition for nested data structures:

array( arrayref[ number ] )
hash( arrayref[ integer ] )
hash( arrayref[ arrayref[ string ] ] )
hash( arrayref[ hashref{ hashref{ arrayref[ string ] } } ] )
willt commented 3 years ago

@wbraswell I think PHASE 1 and 2 should be combined.

duncand commented 3 years ago

I agree with willt and would also propose PHASE 3 and 4 be combined. So the first combined phase is about homogeneous structure support and the second combined phase is heterogeneous structure support.

wbraswell commented 3 years ago

@willt & @duncand I personally welcome and encourage your feedback on all issues which are appropriate for public debate! :-)

On the other hand, please leave details such as the order and rate of implementation to those who are actually writing the code.

Ovid commented 3 years ago

I suspect I'm going to ruffle some feathers, but ...

Just to clarify a few things.

  1. I'm designing Cor, but I'm not a core Perl developer
  2. Sawyer's verbally approved Cor, but it doesn't mean he, or P5P, will agree to all issues
  3. Lack of typing is a serious issue impacting Cor
  4. Lack of typing means inter-language interoperability relies on heuristics and is easy to get wrong
  5. I have zero authority here

That being said, here are some of my thoughts and concerns given what I've read above.

First, absent a type declaration/annotation/attribute of some sort, I assume there will be absolutely no changes in Perl's behavior.

Second, if we provide optional types, those types should follow the principle of least surprise. But who should be least surprised? As a reasonably experienced Perl developer, I know that 0 + [] produces a valid integer. I know that "7 apples" + 2 equals 9. Neither of those things is something I want in a "type" system. At all. If there's a type system in Perl at some point, it needs to behave as people tend to expect a type system to behave. Thus, when serializing something to JSON:

my $int :isa(Int)   = 0;      # Serializes to the number zero
my $str :isa(Str)   = "0";    # Serializes to the string zero
my $bool :isa(Bool) = 0;      # Serializes to `false`

That last (boolean) example, though, raises some interesting questions. Why is zero false in that case? Experienced Perl devs sometimes return the string 0 but true to have something that evaluates as the number zero, but in boolean context is true. It's one of those odd quirks about our language. But what should that do? Well, some people will think we can seamlessly glue Perl's "dynamic" typing to a real type system, but I don't think it will be that easy. For example:

my $int :isa(Int) = 7.2;

That should throw an exception.

What about my $int :isa(Int) = "7";? I think that without an explicit casting system, that should be an exception. Why? Well, you're reading data from a file:

while (my $line = <$fh> ) {
    chomp($line);
    my $int :isa(Int) = $line;
    ... do something
}

$line above is a string. What should happen for each of the following lines?

I want Perl to remain Perl, but if you're going to give me types, let them behave like types and not do magical coercion from " 7 apples " into an integer. Yes, that means the developer will have to do more work to make things "correct", but if we want types, I don't think we can have our cake and eat it to.

And that brings me back to booleans. What should the following print?

my $int :isa(Int) = 0;
say $int ? "Yes" : "No";

I think most people would argue that this should evaluate as false, but when we're dealing with inter-language interoperability, that's not clear to me any more. Let's face is, developers produce a lot of JSON. One of our clients uses lots of SOAP. With a proper type system we need:

The last point will avoid numerous errors. For example:

if ( $foo > $bar ) {
    ...
}

That's perfectly valid Perl. But with types, what if $bar is a string? Or a boolean? For most types systems out there, that's an error because if ( 3 > true ) {...} doesn't make any sense in any logical system I can think of, but for Perl today, that's just fine and dandy (and evaluates to true).

But what about this?

my $bool :isa(Bool) = true;
if ($bool) {...}

There's nothing unusual or surprising there, but what if $bool isn't typed? That's still fine. But what if $bool is undefined? Consider this Java:

public class Main {
    public static void main(String[] args) {
        boolean check;
        if (check) {
            System.out.println("Yes");
        }
        else {
            System.out.println("No");
        }
    }
}

Trying to compile that will give you the following:

Main.java:4: error: variable check might not have been initialized
        if (check) {
            ^
1 error

That's because if that variable isn't initialized, when the test comes, Java can't tell if it's true or false. I would argue that this is the correct behavior and it should be like that for Perl, if and only if, the variable has a type:

my $bool :isa(Bool);
if ($bool) {...}    # runtime error

Without the type annotation, the boolean check will evaluate as false, but with the annotation, it evaluates as true.

I think this will frustrate many Perl developers, but if we have a type system to allow for greater safety, we should have a type system that behaves like a type system. Here's another example from Java:

public class Main {
    public static void main(String[] args) {
        int check = 7;
        if (check) {
            System.out.println("Yes");
        }
        else {
            System.out.println("No");
        }
    }
}

That generates the following error:

Main.java:4: error: incompatible types: int cannot be converted to boolean
        if (check) {
            ^
1 error

Again, that's safe behavior. Fixing it means being a more careful programmer and having if (check != 0) {. (And we'd do something similar in Perl).

The reason I make these suggestions, and I use Java's type system, is because it's a sane system, very predictable, and safe. But if I do this:

foreach my $item (@array) {
    if ($item) { ... }
}

If that runs over the first 20 elements and they're all typed as booleans, great! When it gets to the 21st element and it's a string, not great! There's a good chance that I made a boo-boo there and I want my type system to catch that. Please don't give me a type system that suggests that I might have type safety when, in fact, I do not.

tommybutler commented 3 years ago

@Ovid TL;DR - I agree completely on your points about coercion and least surprise.

MST has on a few occasions referred to code that surprises you by taking advantage of strange behavior as being "clever", and that clever code is not good. Returning zero as a string and a true value seems very clever to me, as does 0 + []. Clever code is ambiguous in expressing the desired intentions of the person that wrote the code. future maintainers then have to guess what the code is supposed to do, and that's just all kinds of frustrating.

I think of even greater importance in your points above, is that we don't try to coerce things. We avoid surprise by being explicit. If people want to coerce things, they can write that logic themselves by explicitly using whatever casting options the type system provides (and it should provide something).

My reasons for this thinking are based in the desire to avoid the overhead of magical behavior and the ambiguity of implicit DWIMmery. The magical nature of coercion is unhelpful in/runs counter to achieving the goal of performance as a future, which I've already advocated for here in this thread for some very key reasons that I won't repeat here for the sake of brevity.

djzort commented 3 years ago

First thought: It's okay to draw a line in that sand and say "if you want to pass around more complicated data structures than /this/, you will have to validate them yourself". That's more or less what you get in python3 with the very common Dict[Str, Any]

Otherwise you're having to design a system to describe everything possible and likely it will just be a pita and people will just do it manually. (see https://github.com/eserte/p5-Kwalify/)

Second thought: Most of the above syntax ideas seem wordy and unperlish. My suggestion would be to borrow from Rust, which is so Perlish that github often confuses it as perl. (To me its a joke that a person might get excited about Rust but then deride Perl as unreadable, but i digress). People really <3 Rust atm so surely there is merit casting our envious eyes upon its perlish syntax.

For a quick look, check out https://doc.rust-lang.org/reference/types.html and for a basic intro https://www.tutorialspoint.com/rust/rust_data_types.htm

So why not leave "my" as is and introduce "let" as "my" equivalent in scope but explicitly typing the variable?

Here's some Rust examples made to look like Perl.

let $company_string = "TutorialsPoint";  # implicit string type
let $rating_float = 4.5;                 # implicit float type
let $is_growing_boolean = true;          # implicit boolean type
let $icon_char = '♥';                    # implicit unicode character type

De-magicking strings in the process and giving people the blessings of types whilst remaining properly lazy.

Here's some examples derived from Rust numbers

let $result = 10;    # i32 by default
let $age:u32 = 20;
let $sum:i32 = 5-15;
let $mark:isize = 10;
let $count:usize = 30;

From a syntax point of view, I don't know if a single colon is the way to go specifically. Spaces might be better. Like this:

let $sum i32 = 5 - 15;
let($sum, i32) = 5 -15;

To me that looks very perilsh. It's concise and brief, which has always IMO been a central tenant of perl syntax.

Then just borrow the type keyword from cperl (it's about 4 lines of code apparently) and you have code like this;

let $age u32 = 18;
if (type $age eq 'u64') {... }
my(%handlers) = ( u32 => sub {}, u64 => sub [} )
$handlers{type $age}->($age)

This also looks great looping an array of mixed types

for my $foo (@mixed) {
$handlers{type $foo}->($foo)

We can also introduce a fun backronym.

MY = Magical Yarn (as in string) LET = Lexical & Explicitly Typed

Yes I have intentionally left out typed versions of our and would suggest that be the way to go, but the above would still be equally valid for local

duncand commented 3 years ago

@Ovid said:

I suspect I'm going to ruffle some feathers, but ...

Thank you for expressing all this, I agree with that post 100%, those are my concerns as well.

duncand commented 3 years ago

@djzort said:

let $result = 10; # i32 by default let $age:u32 = 20; let $sum:i32 = 5-15; let $mark:isize = 10; let $count:usize = 30;

Pardon me if this is covered elsewhere, but I feel it is more reusable and consistent if stuff like the :u32 was attached to the literal instead of the variable, because then it will be correct as a part of arbitrary expressions.

So this is better, and is also what Java/C#/etc do:

let $result = 10;    # i32 by default
let $age = 20:u32;
let $sum = 5:i32-15:i32;
let $mark = 10:isize;
let $count = 30:usize;
djzort commented 3 years ago

@djzort said:

let $result = 10; # i32 by default let $age:u32 = 20; let $sum:i32 = 5-15; let $mark:isize = 10; let $count:usize = 30;

Pardon me if this is covered elsewhere, but I feel it is more reusable and consistent if stuff like the :u32 was attached to the literal instead of the variable, because then it will be correct as a part of arbitrary expressions.

You're correct that :type is convenient and used in other languages. To contribute to the taxonomy, PostgreSQL uses Expression :: type in addition to CAST('500' AS INTEGER). (Although SQL is not something to be envied for good design. Alas QUEL lost that race)

Inserting something between my and = seems to be what dominating the discussion, which is roughly:

verb adverb subject = object

(Someone with a better linguistics background can probably correct me)

So its interesting to bring up an alternative word order, i.e.

verb subject = object : adverb

It's interesting to consider something like the following, although to me it doesnt look very perlish.

my $age = 20 as u32
my $birthday = cast("2020" as u32) - $age

But also from a linguistic point of view, i'm not certain that the reader benefits the most from placing the type adjacent to the proposed value or the proposed variable name? Referring again to the example (with some added whitespace), to me the types seem to get lost at the end of each line:

let $result = 10;    # i32 by default
let $age    = 20:u32;
let $sum    = 5:i32-15:i32;
let $mark   = 10:isize;
let $count  = 30:usize;

Versus

let $result       = 10;    # i32 by default
let $age    u32   = 20;
let $sum    i32   = 5 - 15;
let $mark   isize = 10;
let $count  usize = 30;

FWIW Casting is already sort of-performed in perl when we force context with scalar.

my $count = scalar @stuff

So would casting an array do the same thing?

my $count = @stuff as integer

Anyway, the minimal translation from Rust to "Perlish" i didn't and don't recommend as-is. I apologize for the confusion, I was hoping the reader would follow me on the journey of "transliteration".

I want to recommend what I feel is even more perlish:

let $age u32 = 20;
let($age, 'u32') = 20;

People really love Rust and to me Rust looks very perlish.

wbraswell commented 3 years ago

@Ovid I am very glad to hear that you are in favor of a real type system!

I believe RPerl addresses your requirements from today's post, where you said:

With a proper type system we need:

TYPE DECLARATION: You gave a simple variable declaration example:

my $int :isa(Int)   = 0;      # Serializes to the number zero
my $str :isa(Str)   = "0";    # Serializes to the string zero
my $bool :isa(Bool) = 0;      # Serializes to `false`

The equivalent code which already compiles in RPerl:

my integer $int = 0;   # compiles to (SV containing IV) in PERLTYPES mode, or (long int) in CPPTYPES mode
my string $str = "0";  # compiles to (SV containing PV) in PERLTYPES mode, or (std::string) in CPPTYPES mode
my boolean $bool = 0;  # compiles to (SV containing IV) in PERLTYPES mode, or (bool) in CPPTYPES mode

We also have support for unsigned_integer, character, and number types, which all become real C++ types as well.

The current behavior of the Perl interpreter allows us to assign incorrect data types during variable declaration (and pretty much any other time):

my integer $int = "0";  # incorrectly assigning a string value to an integer variable

Thankfully, we can expect RPerl (and most C++ compilers) to correctly give us warnings and errors when we try to compile this kind of incorrect code, as described in the "Type Checking" section below.

Because Perl and C++ do not behave the same in this regard, this is one of the cases which is ripe for improvement, both in the Perl interpreter and in the Perl compiler, to bring them into closer alignment with one another.

TYPE CASTING / CONVERSION / COERCION:

Because RPerl has a real type system which generates real C++ types, we are required to handle type casting (explicit conversion) and type coercion (implicit conversion).

In simple situations where both Perl and C++ can coerce types, we don't need to do much ourselves. For example, in most simple cases, it is safe for us to allow both Perl and C++ to coerce an integer into a (floating point) number when the context is expecting a number value.

When it is not safe or prudent to allow implicit conversion via coercion, RPerl provides explicit type conversion subroutines for casting from one type to another. For example, let's consider the example of converting an integer to a boolean in PERLOPS_PERLTYPES compile mode: https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataType/Integer.pm#L98-L106

sub integer_to_boolean {
    { my boolean $RETURN_TYPE };
    ( my integer $input_integer ) = @ARG;
    integer_CHECKTRACE( $input_integer, '$input_integer', 'integer_to_boolean()' );
    if   ( $input_integer == 0 ) { return 0; }
    else { return 1; }
}

Here's the equivalent code for CPPOPS_PERLTYPES & CPPOPS_CPPTYPES: https://github.com/wbraswell/rperl/blob/e2cd8f6dc34f6c6da8aa64cc3f69ebee297ba27b/lib/RPerl/DataType/Integer.cpp#L81-L97

# ifdef __PERL__TYPES

SV* integer_to_boolean(SV* input_integer) {
    integer_CHECKTRACE(input_integer, "input_integer", "integer_to_boolean()");
    if (SvIV(input_integer) == 0) { return input_integer; }
    else { return newSViv(1); }
}

# elif defined __CPP__TYPES

boolean integer_to_boolean(integer input_integer) {
    if (input_integer == 0) { return (boolean) input_integer; }
    else { return 1; }
}

# endif

You can see above how the 3 different compile modes handle type conversion in virtually identical ways, albeit specific to the syntax of each compile mode. Since there are a small number of scalar data types, we have already implemented all the combinations of type conversion subroutines such as integer_to_boolean() and integer_to_string() and boolean_to_number() etc

You can find the type conversion subroutines in their respective data type files: https://github.com/wbraswell/rperl/tree/master/lib/RPerl/DataType

TYPE QUERIES:

We need to be able to perform runtime type checking in the 2 compile modes which utilize Perl data types, specifically PERLOPS_PERLTYPES (normal interpreted Perl) and CPPOPS_PERLTYPES (medium-speed compiled Perl).

Type queries are performed by passing your variable or data to the type_fast*() subroutines, the source code of which you can see in my above comment: https://github.com/Perl/perl5/issues/17894#issuecomment-657118621

I do not believe it is possible to perform runtime type queries of static C++ data types in CPPOPS_CPPTYPES compile mode, although compile-time type checks can still be performed if necessary.

TYPE CHECKING:

RPerl enables type checking of arguments passed to subroutines in both PERLTYPES compiles modes. In PERLOPS_PERLTYPES, RPerl injects a call to integer_CHECK*() or number_CHECK*() etc for each input argument, based on each argument's declared data type. If any argument does not match the appropriate type check, an error is generated.

You can see how RPerl declares the data types of input arguments in the following example subroutine. RPerl automatically performs type checking for the integer and number and string_hashref data types passed as input arguments, and as output the subroutine generates a string_hashref return data type. (The string_hashref data type composition issue is ignored for this code example.)

https://github.com/wbraswell/rperl/blob/master/lib/RPerl/CompileUnit/Module/Class/Template.pm#L151-L160

sub gorce {
    { my string_hashref $RETURN_TYPE };
    ( my integer $al, my number $be, my string $ga, my string_hashref $de) = @ARG;
    return {
        alpha => integer_to_string($al),
        beta  => number_to_string($be),
        gamma => $ga,
        delta => %{$de}
    };
}

Likewise, in CPPOPS_PERLTYPES compile mode, the compiler generates C++ source code which will call the equivalent integer_CHECK*() and boolean_CHECK*() etc subroutines, and these lines of source code are placed as the first line of their parent subroutines, which has the exact same effect as PERLOPS_PERLTYPES compile mode and generates the equivalent error.

We do not need to perform our own runtime type checking in the CPPOPS_CPPTYPES compile mode, because it is handled at C++ compile time thanks to our RPerl data type system.

4-PHASE PLAN FOR TYPE DECLARATION:

Back on my post from yesterday, @Ovid what do you think about my proposed 4-phase approach for a data type declarations?

https://github.com/Perl/perl5/issues/17894#issuecomment-657106014

Ovid commented 3 years ago

@wbraswell While I appreciate the time you've put into this, I think defining phases is premature because we need agreement on:

  1. There will be a type system
  2. Where and how the types will apply
  3. What the syntax will be
  4. What the semantics will be

We cannot define phases for the scope of work if we don't know the scope of work, or if there is even work.

Type systems are extremely hard to retrofit into existing languages because what the developers expect and what a type system expects are often at odds with one another.

wbraswell commented 3 years ago

@Ovid Okay that's fine, I shall continue to help build towards agreement on the 4 points you have listed.

Regarding the difficulty of retrofitting a data type system into Perl... Yes I know, I've already built a real type system for Perl, as detailed in all my previous posts. I'm at about 7.5 years of work on RPerl so far, approximately half of which is work on the type system.

I look forward to working with you to build the new type system in a way which will benefit us all.

tommybutler commented 3 years ago

...we need agreement on:

  1. There will be a type system
  2. Where and how the types will apply
  3. What the syntax will be
  4. What the semantics will be

I would submit that we definitely need a type system, and there seems to be a lot of agreement around this. Not sure that's really even a question at this point, so let me just weigh in by saying that it's my opinion that we do need it, and that's really all I can provide on that topic.

Where and how the types apply, this definitely bears out further discussion. Hope we can keep talking about it.

What does syntax will be... Now we're getting to the meat and potatoes. I think now is the right time to discuss preferences around this. I see a lot of different opinions here, and I tend to agree with the opinions that we need to maintain a Perlish syntax. We shouldn't depart too far from what makes Perl Perl, and we should go with, how was it termed earlier? The principal of least surprise. I would submit that we need to go with the principal of greatest simplicity as well. I advocate for syntax along these lines, something like this...

my int $foo = 1; my str $bar = "hello @Ovid ";

Or even these implicit "my" unless otherwise explicitly stated as "our" or "local"...

int $foo = 1; str $bar = 'blah';

I do not like the overly verbose and heavily punctuated forms I've seen that demand of the use of parentheses and colons where I can't see why we would need all of that. It's just extraneous line noise that plays into the perception that Perl is already write only and indecipherable. Less is more. Less typing, and it's also more friendly to those of us who are dyslexic. Let's not require extra punctuation that we don't need. I'm quite serious about this because I think this is the starting point for the discussion that we need to have. We need to start talking about syntax. Hopefully this doesn't erupt into a holy war, but if we need to have a debate, let's do that. Let's figure out how we feel about things. I see this as the first step before we can really start talking about what code anyone is going to go and write.

Really, what the semantics will be... The 4th and final point. I think we need to figure out what the syntax will be before we go there, which is why I've jumped in and chimed in on this.

Again, thank you to everyone who is working hard on this. It's actually quite wonderful. Thank you so much for your efforts!