Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.99k stars 559 forks source link

[feature] Types in Perl #17894

Open Ovid opened 4 years ago

Ovid commented 4 years ago

This will be a somewhat contentious issue and I'll be a bit pedantic at times for those reading this ticket but don't understand all of the issues involved (that includes myself). In particular, I'm going to give a long, rambling justification which I'm sure P5P doesn't need, but it's here to give background to everyone else reading this.

TL;DR

We need to to standardize our type syntax and semantics.

(And yes, TL;DRs need to be at the top of documents, not the bottom)

Typed Signatures

Dave Mitchell has been doing awesome work on subroutine signatures, something that is long overdue in the language (I honestly expected them as part of the Perl 6 project back in 2000).

Part of his proposal deals with types in signatures. The proposal is impressive and, from the synopsis, we have this:

sub f(
        $self isa Foo::Bar,         # croak unless $self->isa('Foo::Bar');
        $foo  isa Foo::Bar?,        # croak unless undef or of that class
        $a!,                        # croak unless $a is defined
        $b    is  Int,              # croak if $b not int-like
        $c    is  Int?,             # croak unless undefined or int-like
        $d    is PositiveInt,       # user-defined type
        $e    is Int where $_ >= 1, # multiple constraints
        $f    is \@,                # croak unless  array ref
        $aref as ref ? $_ : [ $_ ]  # coercions: maybe modify the param
) { ...};

Interestingly, the very first response starts with this:

Yuck. This is a huge amount of new syntax to add. The new syntax doesn't pull its weight, given that it can only be used in this one context. If you're adding a bunch of syntax for type constraints, it should also be available for type checking purposes outside signatures.

There are a number of interesting comments about the proposal, but I want to focus on my primary concern: "type checking purposes outside signatures".

Long, Rambling Justification

When I work on large systems, one of the most frequent bugs I encounter is when data X is passed from foo() to bar() to baz() to quux() and while quux() was expecting an integer, it received something that was not an integer.

If I'm very lucky, the code dies a horrible death and I have to walk back through the call chain to figure out exactly where the bad data originated and frankly, I'd rather rip out my intestines with a fork than have to do that again.

If I'm really unlucky, however, the code doesn't die. Instead, it just silently gives terribly bad, wrong, no good rubbish. And no warning at all that something has gone wrong.

$ perl -Mstrict -Mwarnings -E 'say [] + 1'
140473498905217

Oh, that's not good. So let me validate my argument with a regex!

$ perl -Mstrict -Mwarnings -E 'say [] =~ /\d/ ? "Kill me now" : "Whew!"'
Kill me now

Ah, so I need to be more careful.

perl -Mstrict -Mwarnings -E 'my $d = []; say defined $d && !ref $d && $d =~ /\d/ ? "Kill me now" : "Whew!"'
Whew!

OK, that's better. Finally I'm safe.

perl -CAS -Mstrict -Mwarnings -E 'my $d = chr(43270); say defined $d && !ref $d && $d =~ /\d/ ? "Kill me now: $d " : "Whew!"'
Kill me now: ꤆

I don't even know what is (Google tells me it's part of the Paris metro line, but I'm a wee bit skeptical on that), but I know I forgot the /a switch on my regex. And I'll bet most casual Perl developers don't know about the /a switch and I know for a fact that most large systems don't try to validate their types because it's a pain, it's more grunt work, and it's fraught with error. Or as I like to say "It'̸s ̴a pai̶n, ͝it͏'͠s͢ ̛m͞o҉ré grun͝t̕ ̴w̛ork, a̸nd ̧it's͘ f́ra̛ught wi̧th ̴er̕r̴o̷r̵."

So I applaud David's work, but then there's Cor.

Cor Types

As many of you know, Cor is intended to be the new object system for the Perl core. If you don't want to wade through the wiki, you can watch this talk I gave on Cor.

One thing I briefly touched on and didn't get in to, is typing. So, here's a pointless Python Point class to illuminate this point:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def inverse(self):
        return Point(self.y,self.x)

point = Point(7,3.2).inverse()
print(point.x)
point.x = "foo"
print(point.x)

As you can see, at the end, I set x to the string "foo". How can I prevent that when working on a million-line code base? Well, according to Pythonistas, it's "unpythonic" to validate your arguments. Even Perl developers, largely via the Moo/se family of OO, seem to have grudgingly admitted that yeah, asserting your types isn't such a bad thing.

So while Dave Mitchell's been thinking about types in signatures, I've been thinking about them in Cor. Here's the above point class, with almost identical behavior:

class Point {
    has ($x, $y) :reader :writer :new :isa(Num);

    method inverse() {
        return Point->new( x => $y, y => $x );
    }
}

Except in the Cor world, calling $point->x("foo") would generate a runtime error, just as it would with Moo/se code. In the above, we have "slots" (instance data) declared with has. The attributes merely provide sugar for common things we need in OO systems. Thus, :isa(Num) provides my run time type checking.

And that brings me to the next problem.

SYNTAX

Traditionally, we tend to see types defined in front of the variables, such as declaring an integer in C: int c. In Dave's proposal, it's after the variable: $c is Int. In Cor, we have has $c :isa(Int). Of course, there's also this loveliness:

$ perl -E 'package Int {} my Int $c = "Not an Int"; say $c'
Not an Int

But that syntax has been with us for years and is largely ignored and if we want to attach additional semantics to the type (e.g., coercions), having it after the variable instead of before is probably a good idea.

In short, optional typing in Perl is long overdue, it's planned for signatures, it's planned for Cor, and eventually someone will want to write:

package Foo {
    my $c :isa(Int) = 3;
    # or
    my $c is Int = 3;
    # or
    ...
}

But if we have types, we desperately need to ensure that "there's more than one way to do it" doesn't apply (sorry, Perlers!). Because if signatures use one type syntax, Cor uses another, and regular variables possibly use one or the other (heaven forbid we get a third!), then it's going to be a confusing mess that frankly, I don't want to try to deal with.

And then there's this bit from Dave's proposal:

$e is Int where $_ >= 1, # multiple constraints

I quite like that, but I'm unsure how I would fit the constraints into Cor's syntax. That being said, Cor provides additional behavior to class slots via attributes and it might be a touch disappointing to have an exception here (but I'd live with it).

SEMANTICS

Syntax is nice, but the meaning of the syntax is important. For example, I think we can agree that for a type system, an integer shouldn't match , even if /d/ does. But what does my $c :isa(Int) = -7/3; produce?

#include <stdio.h>
int main() {
   int c = -7/3;
   printf("%d",c);
   return 0;
}

The above C code compiles without a warning and prints -2. Perl, historically, doesn't do the "integer math" stuff and tries to avoid throwing away information:

$ perl -E 'say -7/3'
-2.33333333333333

This is in sharp contrast to other dynamic languages which often get this spectacularly wrong:

$ ruby -e 'puts -7/3'
-3

So, what does Perl do with my $c :isa(Int) = -7/3;? Should it just throw away the extra data? Should it be an error? Should it be a warning? Should the type be ignored?

And I'm not even going to try to figure out a type hierarchy right now, but an Int is a Num while the reverse isn't true. However, we'll need one well-defined and standardized, along with an extension mechanism.

wbraswell commented 4 years ago

I generally agree with @tommybutler

We can create type aliases, so the long-hand and short-hand type names will be the same.

RPerl type names are purposefully expressive AKA long-hand:

my boolean $foo = 0;
my unsigned_integer $bar = 23;
my integer $bat = -23;
my number $bax = -23.4;
my character $baz = 'a';
my string $buz = 'howdy';

We can create aliases for those who feel compelled to save a few characters of typing, short-hand type names:

my bool $foo = 0;
my uint $bar = 23;
my int $bat = -23;
my num $bax = -23.4;
my char $baz = 'a';
my str $buz = 'howdy';
Ovid commented 4 years ago

Just to be clear, while I like the my $foo :isa(Int); syntax, I'm not wedded to it. If we create a sound type system and it involves a different syntax that the community is happy with, I'm OK with that. The main reason I prefer that syntax is because it gives us tremendous consistency with other "adjectives" which modify the noun variable. For example:

has $cutoff :isa(PositiveInt) :reader :builder;

In the above, everything which modifies $cutoff goes after the variable and has a more or less consistent syntax. Further, it seem (to me) to have the potential to be more extensible as we figure out new kinds of types we can drop into :isa(...) (perhaps via something like use My::Types qw(...);.

Again, that's a preference, not an insistence.

Aside from the "overly punctuated" syntax, I also tend to prefer postfix type declarations the way Go does them. This is for the same reason that I find French to sometimes be more understandable due to usually putting less important information (adjectives) after the more important information (nouns). Linguistically, these are called post-positive adjectives.

For example, if we're talking about the "fast, red, old, wet car", we have to keep the ideas of "fast, red, old, and wet" in our head so that when we hear the noun, "car", we can map those adjectives correctly, if we were talking about a "fast, red, old, wet cat", the meaning of those adjectives would all subtly change once we hit the noun, but if the noun's first, we can easily put the adjectives in context: "the cat fast, red, old, wet". It's not something English speakers are as used to, but in romance languages, putting the noun first provides great clarity. This is because we instantly understand the context and then can map the variations of the context easily. If you're less familiar with English, trying to understand all of the adjectives before the noun can be a struggle. But I may be overthinking this.

Again, I'm not wedded to this idea. That's a preference, not a mandate (remember, I have zero authority here).

wbraswell commented 4 years ago

I vote for the cleaner syntax:

my int $foo;

It is more intuitive and concise than the "overly punctuated" syntax:

my $foo :isa(Int);
duncand commented 4 years ago

While having the type on the left of the variable is common practice in many languages, and may work best for Perl, I generally prefer having it on the right on certain basis of consistency. The consistency relates to pairing, having name+type pairs and name+value pairs, the name is consistently on the left, and the extra stuff with it is on the right. A common important example of this is routine parameter declarations and named routine arguments mirror each other structurally. While I wouldn't call it a well-designed language, this consistency is something I like about SQL.

hvds commented 4 years ago

I vote for the cleaner syntax:

my int $foo;

It is more intuitive and concise than the "overly punctuated" syntax:

my $foo :isa(Int);

That may be more intuitive for simple types, but I think it scales less well for complex types. I'm minded (perhaps unfairly) of the delights of function pointer types in C. A sufficiently complex type may well want to span multiple lines (eg if some aspect of it must be a reference to a sub with a complex signature), and there would be distinct value in having the variable name first for such situations.

Of course a sane developer would declare such a type up front to give it a shorter name, but I think the underlying point remains - put the arbitrarily complex thing last, even if you expect it to be simple in almost all cases.

djzort commented 4 years ago

English is inconsistent, for example: The wet cat ran fast.

It's a lot of fun. Anyway.

Semantically, where the type lays also implies what its doing.

my $foo = 5:u32

Seems to imply that $foo is generic and 5 is being cast as an u32. Which would hardly be different the following...

my $foo = "5";
my $foo = 5;

Which most of the time you wont notice the difference with, unless you are using placeholders in DBD::Oracle that looks to what perl thinks the variable is internally as the hint at what to tell Oracle the type is. In the above cast you end up having to do something like 0+$foo to force it to integer so queries dont randomly break.

To that point, when the variable is declared as a typed container it's implied that perl has to coerce or reject the value:

int $foo = 5;
dbl $foo = 5; # should perl die or quietly make this a double?
dbl $foo = 5:dbl; # ok now we are 100% certain this is what you want.

Anyway so that's worth touching on, and that hungarian notation might be the newest policy recommendation in PBP...

Returning to the fondness for my(), to remain perlish the syntax needs to have version with and without sigils.

my int $foo = 5;

and

my(int($foo)) = 5; # ?

druud commented 4 years ago

See also Lexical::Types.

Ovid commented 4 years ago

@apparluk You seem to have some spam in one of your replies. Can you edit that, please?

nguyenhothanhtam0709 commented 1 year ago

Are there any update?

leonerd commented 1 year ago

Are there any update?

Not so far. I believe a reasonable summary of the situation could be "Sure; something like this would be good. Please provide details and implement it". :)

Ovid commented 1 year ago

@leonerd At this point, I've been juggling too much and rather drop some balls, I've put some down. As a result, Oshun seemed like one to put down because I believe you mentioned you were going to take a stab at something. If I'm wrong, my apologies.

If I remember correctly, then you'll take that stab. Or, if you like, I can write up a subset of what was proposed for Oshun and present it as a PPC so we have a place to start having structure. I'd try to write it with the syntax you seem to favor. I'm not bothered by which so long as something happens. Should I spend some time writing up the PPC, or will you go ahead and work on this, or is another approach warranted?

leonerd commented 1 year ago

@Ovid My current plan is