HaxeFoundation / haxe-evolution

Repository for maintaining proposal for changes to the Haxe programming language
111 stars 58 forks source link

[Open Discussion] Cast Specification #120

Open ncannasse opened 6 months ago

ncannasse commented 6 months ago

ATM in Haxe 4 we have different ways to cast a type to another:

I know it's a bit late but I think we should change this for Haxe 5, as this is the kind of big change we except and allow in a major release. My proposal would be to:

Aurel300 commented 6 months ago

Why introduce a keyword for downcast instead of adding a Std.downcast? Is it to avoid spelling out the type in the case that the variable is already declared of type Null<T>? (I think your last example should say Null<T> for null safety.)

hughsando commented 6 months ago

What about using the '?' operator for safe case, like safe access: v:T = cast expr // Throws v:T = cast? expr // returns null

On Sat, Mar 30, 2024 at 10:02 PM Aurel @.***> wrote:

Why introduce a keyword for downcast instead of adding a Std.downcast? Is it to avoid spelling out the type in the case that the variable is already declared of type Null? (I think your last example should say Null for null safety.)

— Reply to this email directly, view it on GitHub https://github.com/HaxeFoundation/haxe-evolution/issues/120#issuecomment-2028079291, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMWTVRVN3VPV5OZPJNA573Y22ZVHAVCNFSM6AAAAABFLGWPKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRYGA3TSMRZGE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

back2dos commented 6 months ago
  • the "safe cast", which is cast(expr,T)
    • at compilation : will ensure that the type of expr is a subtype of T so the cast is valid [...]

This not actually true. In safe casts nothing is ensured at compile time. This compiles without issues: cast ("foo", Int)

back2dos commented 6 months ago

I know it's a bit late but I think we should change this for Haxe 5, as this is the kind of big change we except and allow in a major release. My proposal would be to:

  • make the current unsafe cast syntax do the same thing as the safe cast, so var v : T = cast expr would be the same as var v = cast(expr,T).

Hmm. I think I can somehow understand the basic motivation, although it would help to understand what exactly the problem is you're trying to solve and weigh that against the impact of making such changes. I see some issues:

  1. The amount of extra JS code (effectively __interfaces__ and related RTTI) that safe casts and downcasts add is not negligible and it adds runtime overhead too. Doing either silently doesn't seem such a great idea.
  2. I noticed that the typed AST is often full of unsafe casts, especially when abstracts come into play. We don't really wanna replace all those with Std.unsafeCast nodes, right?
  3. What about var x:{ foo:Int } = cast bar;? Or what about var x:X = cast bar; where X is a type param? Or if one tries to work around variance issues with things like var x:ReadOnlyArray<Float> = cast arrayOfInts;?

While we're on the subject: Std.downcast is oddly specific IMO. AS3 had - in addition to cast - an x as T operator which was basically if (x is T) (cast x:T) else null, without requiring any type relationship between x and T. I often times find myself wanting that rather than Std.downcast.

EDIT: To expand on that thought. I think it would be nice to be able to convert any value to any type in a runtime checked fashion, without the overhead of try/catch and exceptions and what not. Right now, that is as I said if (x is T) (cast x:T) else null or in the words of this proposal if (x is T) (Std.unsafeCast(x): T) else null.

Given the overhead of exceptions, I'm having trouble to comprehend why the type conversion that produces them is pushed to become even more ubiquitous. Not that branching on type is a particularly advisable approach, but when it's done, it's typically for a good reason and it should be as cheap as it can possibly be, and as concise. I.e. the diametrical opposite of this:

try {
  var x = cast(y, X);
  x.doSomethingX();
}
catch (e:Dynamic) {
  someDefaultStuff();
}

That can't be good. On a slightly related note, Java (which despite recent developments still makes for a relatively conservative point of reference) now has these:

public static double getPerimeter(Shape shape) throws IllegalArgumentException {
     if (shape instanceof Rectangle r) {
         return 2 * r.length() + 2 * r.width();
     } else if (shape instanceof Circle c) {
         return 2 * c.radius() * Math.PI;
     } else {
         throw new IllegalArgumentException("Unrecognized shape");
     }
}
public static double getPerimeter(Shape shape) throws IllegalArgumentException {
    return switch (shape) {
         case Rectangle r -> 2 * r.length() + 2 * r.width();
         case Circle c    -> 2 * c.radius() * Math.PI;
         default          -> throw new IllegalArgumentException("Unrecognized shape");
     };
}

Possible syntax for this:

public static function getPerimeter(shape:Shape)  {
     return 
          if (shape is var r:Rectangle) 2 * r.length() + 2 * r.width();
          else if (shape is var c:Circle) 2 * c.radius() * Math.PI;
          else throw "Unrecognized shape";
     }
}
public static function getPerimeter(Shape shape)  {
    return switch shape {
         case (r:Rectangle): 2 * r.length() + 2 * r.width();
         case (c:Circle): 2 * c.radius() * Math.PI;
         default: throw "Unrecognized shape";
     };
}

I do kinda like the idea that (x:X) captures a value as x if it is X and skips the pattern otherwise.

ncannasse commented 6 months ago

I agree the is/cast needs rework, hence my post to open up other possibilities. Following your Java idea, one could be to be able to introduce a new variable as soon as you have tested for ìs`, for instance:

var r : Any;
if( r is Array<String> )  return r[0].length;
if( r is String && r.length > 0 ) return -1;

That would translate to:

if( r is Array<String> ) { var r' : Array<String> = cast r; return r'[0].length; }
if( r is String ) { var r' : String = cast r; if( r'.length > 0 ) return -1; }

And some tricky cases:

if( r is Bool ) r = 0; // assign should still work
if( r is Int ) r = r > 0 ? "" : null; 
// should error as you cannot both use the retyped variable and assign it with its new type ?
back2dos commented 6 months ago

I don't really like that. It only works on variables and adds the tricky cases you described. And one can easily think up even worse ones:

var r : Any;
function trololo() {
  r = 42;
}
if( r is Array<String>) { trololo(); return r[0].length }; // ohnoooo!

I want to evaluate an arbitrary expression, runtime type check it and on success have it in a separate variable. It's more general and avoids the edge case you constructed.

Hence I propose EVar, but of course there are multiple options:

I know that this is a bit quirky in that a variable is declared in a condition to then be scoped into a branch. At the same time, that's not unlike variables declared in loop heads being scoped into the body.

Another thing to consider is that such type checks should probably only allow && and not ||, so (syntax aside) if (e1 is (a:A) && e2 is (b:B)) is valid, but if (e1 is (a:A) || e2 is (b:B)) isn't, because it's not clear what would happen in the branch.

Here's a POC implementation via macro: https://try.haxe.org/#3ba77723


But I think we've sorta entangled two questions:

  1. Should we silently turn all unsafe casts to safe casts, with quite a few ramifications?
  2. Would it be nice to have syntax that allows to check a value against a type at runtime and on success use it as that type in a branch?

I was the one to table the second question, primarily because I think if we had syntax that is both concise and safe, we could drag along cast as is. I'm even inclined to say that unsafe casts should be left in for people who (think they) know what they do, and for macros and what not, and that rather safe casts should - at some point - be deprecated in favor of a newer non-throwing syntax.

ncannasse commented 6 months ago

Regarding the cast specification change:

I think we all agree that var v : T = cast expr and var v = cast(expr,T) should do exactly the same thing. It was a mistake in my original design to have such similar constructs with two different behaviors, it's quite confusing for newcomers. From there there's two options (a) deprecate one of the two syntax or (b) unify their behavior:

I agree there's not a lot of very good solutions here but I think we should not let this slip for Haxe 5. The is variable discussion is another topic so maybe let's focus on cast for now.

Aidan63 commented 5 months ago

If cast expr were changed to be the same as safe casts would that also have to apply to the pointer casting the c++ target provides? I.e. would it need to start using dynamic_cast instead of the old c style casts it currently generates when the casting pointer interop types? If that were treated as a special case with the c++ target and its behavour kept the same it would make casting code even harder to reason about, although this might be acceptable as the difference is likely to only be found on the boundaries between haxe and c++ externs.

Somewhat related to the is discussion, C# also has that syntax, but also has the as keyword to cast a variable to a type or null if invalid (In C# the old C style syntax throws if invalid, hence the null return). While that behaviour matches our Std.downcast behaviour similar syntax could be used if we want to deprecate one of the current casting techniques.

var v = expr as T

back2dos commented 5 months ago

I think we all agree that var v : T = cast expr and var v = cast(expr,T) should do exactly the same thing. It was a mistake in my original design to have such similar constructs with two different behaviors, it's quite confusing for newcomers.

Haxe - like most languages - has a few oddities that might confuse newcomers, although I think I would put Map waaayyy ahead of this. Most of them are quite trivial and are much more easily understood than when and how to use Haxe's many features. I agree that the manual could perhaps be slightly less frugal.

I concede that if we were discussing adding the same functionality today, it would seem like a bad idea to make both look so similar.

But it is how it is and it's not outrageously bad. Even Wikipedia distinguishes checked and unchecked casts - which fully correspond to our safe and unsafe casts.

If we still want to change this (instead of properly documenting it) I definitely think that changing the spec is the worst possible way to approach it. Newcomers are great and all, but this would be a clusterfuck for existing users, with a pretty lousy migration path. A better alternative would be to deprecate both uses of cast and come up with better alternatives for each.

I never fully understood the purpose of cast (e, T). Either I know it is safe, then I don't want the runtime overhead, or I don't then I don't want to deal with exceptions, but rather make the check myself and follow up with an unsafe cast in the branch. So I think more concise syntax for runtime type check + branching would make a decent alternative to offer to the user, while deprecating safe casts. There are multiple proposal here already, and if this is something we wish to pursue I would suggest a separate discussion to arrive at a decent solution.

As for unsafe casts, they are useful in places where the user knowns that some expression will be off some value, but the compiler does not. The easiest example is right after a runtime type check of course. But in general cast e is useful for type erasure, which unfortunately is needed quite often, especially when dealing with variance, type parameters or externs. I would very much welcome if we could have something similarly succinct (or even shorter).