dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

[extension-types] Use the extension type erasure during flow analysis? #3534

Closed eernstg closed 8 months ago

eernstg commented 9 months ago

Thanks to @sgrekhov for bringing up this issue. See also https://github.com/dart-lang/sdk/issues/54444 where a similar situation is mentioned.

Consider the following program:

extension type ET(int i) {}

int f() {
  switch (ET(1)) {
    case int _: return 0;
  }
}

This program is rejected by the analyzer and the CFE, reporting that f might reach the end of the function body (in which case it would return null, which is not type correct because the return type is non-nullable).

The reason for this behavior is, presumably, that the flow analysis considers the type of the scrutinee to be the extension type ET, and it does not consider the extension type erasure int.

However, the extension type erasure is the most precise information we have about the situation at run time, and the sub/super-type relationships of extension types to other types is not helpful during flow analysis.

I'd recommend that flow analysis should use the extension type erasure to determine whether or not a case like case int _ is guaranteed to match the scrutinee. The erasure should be applied to the type of the matched value as well as to any object patterns whose type is an extension type.

Note that there is a certain overlap between this issue and https://github.com/dart-lang/language/issues/3524.

@dart-lang/language-team, WDYT?

lrhn commented 9 months ago

There are two places where this matters.

The exhaustiveness algorithm, which just gives a plain result of exhaustive or not. It has to use the erased type.

The flow analysis, which includes both control flow analysis, type inference and type promotion. It should probably still use the erased type for flow analysis, because that's what matters at runtime. The analysis is a static approximation of runtime behavior, and runtime control flow behavior is defined entirely in terms of erased types.

Then there is the static type analysis. The example above is trivial. Consider instead:

extension type ET(int i) {}
extension type TO(int i) implements ET {}
int f() {
  ET x = ET(1);
  switch (x) {
    case int y: 
      x.expectStaticType<Exactly<ET>>;
      y.expectStaticType<Exactly<int>>;
    case TO z: // Unreachable, I know.
      x.expectStaticType<Exactly<TO>>;
      z.expectStaticType<Exactly<TO>>;
  }
}

Here the int y pattern does not promote x since it's not a subtype, and the type of y is the matched type. For TO z, the test does promote x, and z has the declared type.

I think the static types should just fall out of the current definition, so it really is only control flow analysis that needs to care about the erased type. Including "can this pattern match that value" logic.

So, in conclusion: Flow analysis must use erased type, because it must match runtime behavior to be sound. If it concludes that any case is not reachable, when it is, then that's an exploitable bug. If it concludes that any case is reachable, when it isn't, then that's probably not a bug, just an inaccuracy. But still annoying and unnecessary.

I don't see any reasonable alternative to using the erased types.

leafpetersen commented 9 months ago

So, in conclusion: Flow analysis must use erased type, because it must match runtime behavior to be sound.

@lrhn this is concerning if true, but I don't think it is demonstrated by your example or discussion. Can you elaborate?

lrhn commented 9 months ago

I'm really just using the definition of "sound" as "static analysis correctly predicts runtime behavior". We usually use it about the static type "analysis" where we statically assign a type to each expression, which should correctly, conservatively, predict the type of value of that expression at runtime.

If the analysis doesn't actually predict runtime behavior, it's not a sound approximation of the runtime behavior. And that's really the most important part, for a static analysis.

Whether that actually leads to type-unsoundness depends on what we use the analysis for. I'm assuming that we actually use the result of the analysis for something, otherwise it's all moot.

For flow analysis, that's usually whether some code may run or not. If we end up incorrectly assuming that all code can potentially run, then that's probably safe. It's the conservative fallback position.

If we incorrectly assume that some code definitely cannot run, or definitely will run, then it's probably possible to escalate that to type- or definite-assignment-unsoundness.

Maybe the incorrect assumption will just end up making a switch non-exhaustive, where it should be exhaustive, which stops itself from ever running. Or prevent promotion that would otherwise be correct, but not be usable to do an unsound promotion. Maybe. It's not that I have concrete examples.

Let's try.

Let's say that flow analysis assumes that a type check for extension type const Ex(int value){} doesn't match a value of type int, or vice versa, and that makes a particular switch case unreachable.

extension type const Ex(int value) {}
void main() {
  int i = someInt();
  int x;
  switch (i) {
    case >0: x = 1;
    case Ex(value: var n): print("$n here");
    case _: x = -1;
  }
  // x definitely assigned here?
}

If flow analysis predicts that Ex() cannot match, and therefore that particular case body will never be chosen, then it should make x definitely assigned after the switch. That's wrong if i is zero.

If it doesn't predict that case won't be executed, then it's because we don't actually use the analysis result, which is trivially safe. But then it doesn't matter what we do.

(I can't come up with an example where analysis could predict a match that cannot happen in practice, which it wouldn't predict on the erased type too. If we have such a case, it would be the kind that could potentially cause an incorrect exhaustiveness result.)

leafpetersen commented 9 months ago

If the analysis doesn't actually predict runtime behavior, it's not a sound approximation of the runtime behavior.

This is not true at all. Essentially all static analyses will fail to predict some runtime behaviors. All non-trivial static analyses must over-approximate, which means they fail to predict some runtime behaviors. Soundness in general is the property that all actual possible executions are included in the approximation (along with possibly others), but by its nature, over-approximation means that the analysis may be failing to predict things (e.g. which path through a switch is taken). The inferences that you draw from the analysis must account for over-approximation if it is relevant.

For flow analysis, that's usually whether some code may run or not. If we end up incorrectly assuming that all code can potentially run, then that's probably safe. It's the conservative fallback position.

Right. There are lots of places where we choose not to promote (e.g. because not a subtype), and therefore include in the set of executions some that are in fact not possible.

If flow analysis predicts that Ex() cannot match, and therefore that particular case body will never be chosen, then it should make x definitely assigned after the switch. That's wrong if i is zero.

Right. This would be unsound, because the approximation of the feasible execution paths computed by the analysis would not include some of the actual execution paths. We shouldn't do this. This isn't really a new example for extension types though - we can already have a switch where one of the cases is an object pattern which is not subtype related to the scrutinee but the case is still reachable. The following example is correctly rejected, since it is possible for the two classes to share a hidden subtype.

class A {}
class B {}

void test(A a) {
  int x;
   switch (a) {
      case B(): print "hello";
      case _ : x = 1;
  }
  print(x);
}

For essentially the same reason, a use of x after the switch in your example is correctly rejected by the current implementation: there is morally a hidden shared subtype of int and E (namely, int).

So I don't think I see anything that we need to do here - as best I can tell, flow analysis is behaving soundly, which is the main thing. Now it's certainly true that we could choose to do something differently here, but I'm not sure I see the motivation.

lrhn commented 9 months ago

We can always give a less precise analysis. I do worry that with done extension types being must-exhaust, and others not, the ones that are not will stand out if they are the only ones using the worse analysis.

Look at the example in #54444. The switch statement is an error because it's not exhaustive, the switch expression with the same cases is considered exhaustive. All because the switch is not must-exhaust, and the flow analysis considers it "maybe not exhaustive". If it had been "must exhaust", the flow analysis would have deferred to the exhaustiveness algorithm.

We've accepted that there are done differences between the two algorithms, where the flow analysis can't be as precise, but this difference seems unnecessary. It can be just as precise here, by using the erased type.

stereotype441 commented 9 months ago

(Joining this thread late because I was on vacation)

When it comes to flow analysis, my definition of "sound" is something like: "flow analysis is sound if and only if, for all possible program inputs, at every point in program execution, the source code associated with the program counter is at a location that flow analysis considers reachable, and the value in each local variable has a runtime type that's a subtype of its promoted type."

Under that definition, if flow analysis fails to promote a variable that it could have promoted, or fails to notice that a certain code path is unreachable, then that isn't a soundness issue.

So, from the standpoint of soundness, I agree with Leaf that we don't need to do anything here.

But of course, soundness isn't the only goal of flow analysis. Another goal of flow analysis that we took into consideration during the development of pattern support is the goal of reducing the mismatch between the exhaustiveness checker and flow analysis's notion of reachability. That's important because whenever these two algorithms disagree, it tends to lead to user confusion. The example code in https://github.com/dart-lang/sdk/issues/54444 is exactly the kind thing we ran into while we were implementing patterns--a switch statement that is considered exhaustive by the exhaustiveness algorithm but not by flow analysis, so the user gets a confusing error message (the analyzer says the code after the switch statement is reachable, but adding a default clause causes a warning from the exhaustiveness checker).

The mechanism we already have in place to try to reduce this sort of confusion is that when flow analysis is handling any sort of pattern match that implicitly performs an is T test on the scrutinee, if the matched value type is a subtype of T, then for the purpose of computing reachability, flow analysis considers the is T test to be guaranteed to pass.

(Incidentally, this is the mechanism that allows flow analysis to understand that a switch statement ending with case var x: fully covers the scrutinee, because the inferred type of x is the type of the scrutinee.)

This mechanism is specific to pattern matching, and is independent of the rest of the logic of flow analysis. I believe it would be fairly straightforward to update it to perform type erasure.

That would mean that this code (from @eernstg's initial example) would be allowed:

extension type ET(int i) {}

int f() {
  switch (ET(1)) {
    case int _: return 0;
  }
}

As would this example from @lrhn's comment:

extension type ET(int i) {}
extension type TO(int i) implements ET {}
int f() {
  ET x = ET(1);
  switch (x) {
    case int y: 
      x.expectStaticType<Exactly<ET>>;
      y.expectStaticType<Exactly<int>>;
    case TO z: // Unreachable, I know.
      x.expectStaticType<Exactly<TO>>;
      z.expectStaticType<Exactly<TO>>;
  }
}

(and flow analysis would understand that the TO z case is unreachable).

Since we're past the cutoff for Dart 3.3 Beta 3, this change would have to be implemented and cherry-picked before the release deadline (Feb 14), so we should take the tight schedule into account when deciding what to do.

stereotype441 commented 8 months ago

In today's language team meeting, some folks asked for clarification about the catch-22 that a user might run into due to differences in behavior between the exhaustiveness checker and flow analysis. I've written up an example. It's not quite as bad as I thought.

Imagine the user writes this code (which I would argue is somewhat contrived but not absurd):

extension type FancyString(String s) implements String {
  String fancify() => '**$s**';
}

String describeFancily(String s) {
  switch (s) {
    case '':
      return 'The empty string';
    case FancyString fs:
      return 'The string ${fs.fancify()}';
  }
}

Attempting to run this code results in a compile-time error:

../../tmp/proj/test.dart:5:8: Error: A non-null value must be returned since the return type 'String' doesn't allow null.
String describeFancily(String s) {
       ^

(This is because detecting a missing return is the responsibility of flow analysis, and flow analysis doesn't understand that the switch is exhaustive).

If the user tries to fix the error by changing the switch statement to this:

  switch (s) {
    case '':
      return 'The empty string';
    case FancyString fs:
      return 'The string ${fs.fancify()}';
    case _:
      throw 'UNREACHABLE';
  }

Then the analyzer reports:

info - test.dart:11:5 - This case is covered by the previous cases. Try removing the case clause, or restructuring the preceding patterns. - unreachable_switch_case

(This is because detecting an unreachable case is the responsibility of the exhaustiveness checker, and the exhaustiveness checker is capable of figuring out that case _: is unreachable).

So the catch-22 is, if the user doesn't include the unreachable case _:, they get an error that seems like it would be fixed by adding it; if they do include it, they get a warning suggesting that they should remove it.

However, it's not quite as bad of a catch-22 as I previously thought, because for some reason, default: doesn't trigger the unreachable_switch_case warning. (Maybe that's an analyzer bug, though. Since default: behaves equivalently to case _:, it seems like it ought to trigger the same warnings. I've filed https://github.com/dart-lang/sdk/issues/54575 to track this.)

stereotype441 commented 8 months ago

Note: I've prototyped this change. It turned out to be fairly straightforward (3 lines of code, plus moving an abstract method from one interface to a super-interface). It's also fairly targeted (the only logic affected is the computation of the coversMatchedType boolean in _FlowAnalysisImpl.promoteForPattern, which decides whether a pattern match's implicit is check is guaranteed to succeed). So if we decide we want to move forward with this (and cherry-pick it), it should be fairly low risk.

I still need to clean up the CL, add some test coverage, and verify that no other tests are broken by the change; I'll work on that today.

munificent commented 8 months ago

I know I said the opposite in the last language meeting, but my feeling is that exhaustiveness checking should not erase the extension type and used the representation type when the matched value type or any of the case types are extension types.

Using an extension type in a switch that does type tests is a deeply weird operation so it's unlikely that anything particularly elegant will fall out. It's as weird and problematic as using is or as on extension types. (I would be tempted to make it a static error to have an extension type appear on the RHS of is, in a switch's matched value type, or as an object pattern when the extension type isn't a supertype of the matched value type. But that's a bridge too far.)

But, in general, the user experience we try to offer for extension types is that the representation type is as encapsulated as we can make it without sacrificing the fundamental performance goal of erasing the extension type. So if the user hasn't chosen to make an extension type reveal the representation type, then as many operations as possible should treat the extension type as opaque.

To me, that extends to exhaustiveness checking. Even though, yes, the runtime type tests will use the representation type, the user did choose to wrap them in extension types for some reason, so the static analysis should work on that wrapped type.

It's annoying weird if they get stuck with an error without a default case and a different error with one. But I think the odds of running into that are relatively low. When they do run into it, they should be able to avoid it by either not using the extension type in the case, or wrapping the matched value. So:

String describeFancily(String s) {
  switch (FancyString(s)) { // Make matched value a fancy string too.
    case '':
      return 'The empty string';
    case FancyString fs:
      return 'The string ${fs.fancify()}';
  }
}

String describeFancily(String s) {
  switch (s) { // Make matched value a fancy string too.
    case '':
      return 'The empty string';
    case var s:
      return 'The string ${FancyString(s).fancify()}';
  }
}

One way to look at this is that the origin sin is extension types can establish a subtype relation with their representation type but not equivalence to it. In:

extension type FancyString(String s) implements String {
  String fancify() => '**$s**';
}

FancyString f = 'hi';

You get an error because FancyString is a subtype of String and not a supertype. Likewise, this isn't exhaustive:

extension type FancyString(String s) implements String {
  String fancify() => '**$s**';
}

String describeFancily(String s) {
  switch (s) {
    case '':
      return 'The empty string';
    case FancyString fs:
      return 'The string ${fs.fancify()}';
  }
}

Exhaustiveness thinks that FancyString is a subtype of String and that there may be other subtypes and possibly other instances of String that are not FancyString. But, because of how extension types work, what users actually know to be true is that FancyString is equivalent to String. Every String is a FancyString and every FancyString is a String.

You could imagine extending the language so that if an extension type declares that it implements its representation type, then we treat the extension type as equivalent to the representation type.

That would allow assignment in both directions:

String s = FancyString('hi');
FancyString f = 'hi';

And then exhaustiveness checking would know that a case of an extension type covers all values of the matched value type.

eernstg commented 8 months ago

@munificent wrote:

exhaustiveness checking should not erase the extension type and used the representation

Please note that this is a different topic: The exhaustiveness analysis has been specified to use extension type erasure since more than 6 months ago (it's there in the first version of the feature specification where the feature is called 'extension types'; I haven't tracked it down to the time where it was added to the inline class feature specification, or even earlier proposals). It has been implemented and tested for quite a while.

The question here is just whether we should also perform extension type erasure at the relevant points in the flow analysis, such that we avoid getting catch-22 advice from the two analyses, as described by @stereotype441 here.

as many operations as possible should treat the extension type as opaque.

I think some extension types have this encapsulated nature (they are "closed"), and other extension types are essentially interface extension devices: They are adding a bunch of members to an existing type, or shadowing an existing member by another one (a forwarder) whose member signature is considered better for some reason (these extension types are "open").

I actually included keywords like open and closed in several versions of the extension type feature specification proposal (years ago when it had lots of other names), but I never got enough support for making this distinction explicit in terms of modifiers.

So what we have today is a distinction which is based on implemented types:

extension type FancyString(String it) implements String {...} // Open.
extension type SecretString(String it) {...} // Closed.

The criterion is that the extension type implements a non-extension type (directly or indirectly). So FancyString is open because it implements String, and SecretString is closed because it doesn't implement any non-extension types (other than Object?, implicitly).

Note that it's a compile-time error unless that implemented non-extension type is a supertype of the representation type. In other words, this implements relation ensures that the extension type voluntarily announces that it "is" the representation type or a supertype thereof. This seems to be a natural way to indicate that this representation type isn't a secret, or at least part of it is announced publicly.

We're using this lint to flag each location in code where a dynamic inspection is used to transition a given object from one type to another (e.g., by promoting a variable, casting an expression (as), testing the type of an expression (is)) such that an extension type is assumed for an object where that wasn't previously the case.

For instance, it would lint myString as SecretString, myString is SecretString, if (myString case SecretString _) ..., and so on, because this means that the type SecretString is assumed for an object whose static type was previously String, and that relationship is (weakly) encapsulated.

To avoid the lint, call a constructor. If you can't do that then it is probably a hint from the developer that you shouldn't do it.

In contrast, no lint message is emitted at myString as FancyString, myString is FancyString, if (myString case FancyString _) ..., and so on.

The situation is somewhat convoluted, but I still think it's worth considering the "soft encapsulation" support which is already coming (that is, the lint and the reliance on implements NonExtensionType). Do you think you could live with that level of encapsulation support?

lrhn commented 8 months ago

If we don't let an extension type exhaust its representation type, or vice versa, it becomes impossible to exhaust an extension type at all, except by the type itself or a wildcard.

That would prevent exhaustiveness for a use like:

extension type const Option<T>._(({T value})? _) {
  const factory Option.some(T value) = Some<T>;
  static const Option<Never> none = None._(null);
}
extension type const Some<T>._(({T value}) _) implements Option<T> {
  const Some(T value) : this._((value: value));
  T get value => _.value;
}
extension type const None._(Null _) implements Option<Never> {
  const None._opt() : this._(null);
}

which you can currently exhaust as:

void main() {
  var b = DateTime.now().millisecondsSinceEpoch.isEven;
  Option<int> maybeInt = b ? const Option.some(42) : Option.none;
  var intWithDefault = switch (maybeInt) {
    Some(: var value) => value,  
    None() => 0,
  };
  print(intWithDefault.toRadixString(16));
}

(I know I keep dragging up this example. I don't have a lot of example, but something like this is not unreasonable. I'll say that it's a likely use-case, maybe not precisely this, but then something else on top of a nullable type.)

The code above works today. It's exhaustive, and detected as such. (The only issue is that the analyzer gives some warnings about an Option<int> not being able to match a Some<int> pattern, which is just wrong, no matter how we measure it — they're subtypes, and so are their representation types, so no matter which one we use, it shouldn't look impossible to match).

If we say that a Some<int>(: var value) cannot exhaust the type ({int value}), and it definitely cannot exhaust a part of the extension type Option<int> (because Option<int> is not a sealed type, so it doesn't introduce sub-spaces itself, we don't have sealed extension type), then there is *no way* to exhaustOptionusingSomeandNone, or *any other pattern exceptOption()or_`*.

That feels like taking away a useful feature.

I know it feels like looking below the abstraction, but it's the author of all three types that is doing that. They are defined to fit together, they know the relationship between the representation types.


So, is there some way we can restrict this to only types that are designed together (and already know each other's representation types), and not necessarily allow just anybody to add a type from the side?

Consider: If E is an extension type with representation type S.

  1. If E implements S, then the space for E is equivalent to the space for S (either can exhaust the other).
  2. If E2 with representation type S2 implements E, and S2 \<: S1, then E2 works towards exhausting E1 the same way as S2 does towards exhausting S.

(If that makes sense at all. I think it does. I can be wrong.)

That is, an "open" extension type like extension type FancyInt(int _) implements int { more members } counts as the type it adds members to.

A parallel hierarchy like Option above allows exhausting to apply to the representation types.

extension type E1(bool? b) { ... }
extension type E2(bool b) implements E1 { ... }  // Parallel hierarchy

extension type E3(E1 _) implements E1 { ... } // Adding more members.

extension type E4(E2 _) implements E1 { ... } // Not included! Does not count towards exhausting E2 or E1.

It's complicated, but it does allow some reasonable use-cases, but also doesn't allow just any unrelated type to exhaust another. There needs to be a direct implements relation, which either introduces a subtype/subspace of the representation type, or introduces more members on essentially the same type.

I'd rather keep the current "just use erased representation type" approach than completely prevent sub-extension types from being able to exhaust supertypes.

eernstg commented 8 months ago

If we don't let an extension type exhaust its representation type, or vice versa, ...

Please note again that this is a different topic: The exhaustiveness analysis has been specified to use extension type erasure since more than 6 months ago, and It has been implemented and tested for quite a while. This issue is about the flow analysis, about whether or not it should be aligned more closely with the existing exhaustiveness analysis.

stereotype441 commented 8 months ago

Note: I've prototyped this change. It turned out to be fairly straightforward (3 lines of code, plus moving an abstract method from one interface to a super-interface). It's also fairly targeted (the only logic affected is the computation of the coversMatchedType boolean in _FlowAnalysisImpl.promoteForPattern, which decides whether a pattern match's implicit is check is guaranteed to succeed). So if we decide we want to move forward with this (and cherry-pick it), it should be fairly low risk.

I still need to clean up the CL, add some test coverage, and verify that no other tests are broken by the change; I'll work on that today.

The fullly cleaned-up CL is here: https://dart-review.googlesource.com/c/sdk/+/345822.

In addition to the 3 lines I mentioned earlier, and moving an abstract method from one interface to another, I had to change the analysis of cast patterns; it was previously relying on the coversMatchedType boolean in _FlowAnalysisImpl.promoteForPattern to determine whether to report an "unnecessary cast" warning (under the assumption that if the type cast was guaranteed to succeed, that meant that it was unncessary). With extension type erasure, that assumption is no longer valid.

I still believe it's worth moving ahead with this change (assuming we retain the decision to use extension type erasure in exhaustiveness analysis), and cherry-picking this CL to the beta branch. But I'll wait for a consensus from the language team before moving forward with this.

leafpetersen commented 8 months ago

My read on this is that unless we want to revisit using the erased type for exhaustiveness, then the consistent thing to do is to use the erased type in flow analysis here as well. Does anyone feel that, given no time/implementation constraints, we would prefer in fact keep the current yes/no split?

Changing exhaustiveness to not use the erased type feels like a big and risky change to make now. If we feel strongly that this should be done, I might be inclined to do this as a breaking change in a follow up release (or at least a patch release). It feels like a dangerous change to be making with such a short run up.

So the best options to me seem to be to either:

Does that seem right? Thoughts, disagreement?

eernstg commented 8 months ago

Does anyone feel that, given no time/implementation constraints, we would prefer in fact keep the current yes/no split?

Just to make it explicit: The yes/no split may cause a catch-22-ish situation with the static analysis, as described by @stereotype441 here.

stereotype441 commented 8 months ago

On Wednesday, @leafpetersen said:

My read on this is that unless we want to revisit using the erased type for exhaustiveness, then the consistent thing to do is to use the erased type in flow analysis here as well. Does anyone feel that, given no time/implementation constraints, we would prefer in fact keep the current yes/no split?

Changing exhaustiveness to not use the erased type feels like a big and risky change to make now. If we feel strongly that this should be done, I might be inclined to do this as a breaking change in a follow up release (or at least a patch release). It feels like a dangerous change to be making with such a short run up.

So the best options to me seem to be to either:

  • Accept the specified behavior for exhaustiveness and fix flow analysis in a cherry pick.
  • Leave flow analysis unchanged and change exhaustiveness in a patch release.

Does that seem right? Thoughts, disagreement?

I didn't respond last week because I wanted to give others a chance to weigh in, and I think my opinions are clear from my previous comments on this issue (I favor accepting the specified behavior and fixing flow analysis in a cherry pick, which is what https://dart-review.googlesource.com/c/sdk/+/345822 does).

Since we haven't had any further discussion (other than @eernstg's helpful clarification), and the ship date for Dart 3.3 is in 3 weeks, I'm inclined to move forward with my fix.

If I don't hear objections by tomorrow, I'll send out https://dart-review.googlesource.com/c/sdk/+/345822 for review, and then, when it lands, I'll start the cherry-pick process.

lrhn commented 8 months ago

Ship it!

stereotype441 commented 8 months ago

Since the fix has landed (https://github.com/dart-lang/sdk/commit/97edad1568015a3b556d5365fde83d2cf8ca7d81), I'm going to close this issue now. The only remaining work is to cherry-pick the fix to the beta branch; that work is tracked in https://github.com/dart-lang/sdk/issues/54720.