[web]: discrepancies between JS and Wasm backends with -O3 or higher.

dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.

https://dart.dev

BSD 3-Clause "New" or "Revised" License

10.21k stars 1.57k forks source link

[web]: discrepancies between JS and Wasm backends with -O3 or higher. #56949

Open sigmundch opened 2 days ago

sigmundch commented 2 days ago

Unsafe optimizations levels like -O3 and -O4 in dart2js and dart2wasm don't promise any guarantees around semantic behavior if assumptions are wrong. However, they currently take different routes and can cause friction for developers that support both JS and WASM outputs or that are migrating from one to the other.

Take this minimal example:

void main () {
  try {
    final Map<String, dynamic> map = {};
    int value = map['key']; // implicit null check
    print(value); // (1)
  } catch (e) {
    print('value was null'); // (2)
  }
}

Here:

The implicit cast and null check on the assignment to int value fails in -O2 and lower for both compilers, and gets caught by the try-catch block (2).
When using -O4, dart2js will print null on (1) instead.
When using -O4, dart2wasm will hit a wasm trap and exit with a runtime error that is not caught in (2).

Unfortuantely, flutter tools default to using -O4, and it is not obvious to developers that the discrepancy in semantics is coming from unsafe optimizations.

osa1 commented 1 day ago

I think the only way to fix some of these discrepancies is by implementing Dart properly in both targets. For example:

void main () {
  final List<int> ints = [1, 2, 3];
  final int i = ints[10];
  f(i);
}

f(i) {
  print(i.runtimeType);
  print(i);
}

Here the out-of-bounds access will cause a Wasm trap, which cannot be caught from within the same Wasm execution. So the execution of this code will just halt at that point. In JS you get an undefined, which you can pass around, test for etc.

So the effect of omitting the same check in both platforms are different.

Making dart2wasm work like dart2js here requires adding the bounds check back. But there isn't an undefined equivalent in Wasm, so once we do that we should throw a Dart error, per spec.

Making the dart2js work like dart2wasm is probably not possible, and certainly not desirable.

So the only way to make both work the same way is to add the bounds check back, and once we do that we can just implement the proper Dart semantics. (throw an exception)

lrhn commented 1 day ago

I don't think we should make any guarantees about what unsafe optimizations do on different platforms, and especially not that they do the same thing. We're leaving Dart semantics, and how that looks depends entirely on which underlying semantics we fall back to instead.

You should be developing and testing without unsafe optimizations, so that any bug actually throws. Defaulting to -O4 in development is asking for trouble. The difference between unsafe optimizations on different platforms should only matter if you are debugging in production. (Which does happens.)

osa1 commented 1 day ago

You should be developing and testing without unsafe optimizations, so that any bug actually throws. Defaulting to -O4 in development is asking for trouble.

This wouldn't help finding the bug in https://github.com/flutter/devtools/issues/8452. The original code was the following:

static FilterTag? parse(String value) {
  final parts = value.split(filterTagSeparator);
  try {
    final useRegExp = parts.last == useRegExpTag;
    final query = parts[0].trim();
    final settingFilterValues =
        (jsonDecode(parts[1]) as List).cast<Map<String, Object?>>();
    return FilterTag(
      query: query,
      settingFilterValues: settingFilterValues,
      useRegExp: useRegExp,
    );
  } catch (_) {
    // Return null for any parsing error.
    return null;
  }
}

If you run this without unsafe optimizations the catch block catches the errors and this works as expected.

In unsafe mode, the catch block doesn't catch the errors because the errors turn into traps (halts the execution, cannot be caught).

The way to test this is with unsafe optimizations trapping when things go wrong, which is exactly what the release mode does.

lrhn commented 1 day ago

The code in question should at least have used on Exception to catch parsing exceptions, maybe even on FormatException, and checked that parts has at least two elements. We have two style guide entries ([1], [2]) for that, one of them with a lint.

Doing parts[1] + catch(_) instead of checking that there is a parts[1] is discouraged, even if you do it indirectly. If it's not deliberate, which is more likely, this is a good example of why those "use an on type" recommendations exist. If you do catch (_) {/* ignore error */ }, you hide all errors, including bugs in your own code. (If anything, the untrappable WASM error was a bonus if it led to this bug being discovered.)

I guess the conclusion would be to enable more lints, and also test with -O4, on all platforms, if you want to know that you find all problems. And that testing can't find all bugs, especially if the code actively conceals errors.

sigmundch commented 1 day ago

Completely agree here. I'd very much would like the default to not be -O4. At first, we should make it easier to make -O2 available for development and debugging, but with time I'd like the default to change. I think the right approach for this, which @osa1 also mentioned in an internal thread, is to use scoped pragmas as @rakudrama has suggested in the past: then let framework developers apply the pragmas selectively in performance critical logic, but leave application code with the checks.

I like the idea of enabling lints to detect scenarios that hide assumptions made by -O4. As both of you said, one such lint should be to require precise catch clauses. Are there are others to include?

/cc @yjbanov @eyebrowsoffire - any reservations with these ideas for switching the default optimization flags for flutter web over time?

/cc @bkonyi - the flutter tool today exposes a way to switch dart2js builds to use different optimization levels, I was wondering if we could generalize this to apply to wasm too?

osa1 commented 14 hours ago

Regarding testing unsafe code, another idea could be adding a mode to both dart2js and dart2wasm (and any other target that omits runtime checks) to convert the checks we want to avoid in production mode into some kind of crash/failure that is not possible to catch and handle in Dart.

So basically the debug mode, but runtime checks omitted in release mode stop execution when failed, in a way that cannot be accidentally handled.

Stopping the execution can be done with a trap in Wasm, but I don't know if it's possible to do in JS.

We could even make this the only debug mode, which effectively changes runtime error semantics of the language by making them impossible to catch and handle. (because both debug and release modes of the compiler makes them impossible to catch)

osa1 commented 13 hours ago

From https://github.com/dart-lang/sdk/issues/56655#issuecomment-2339754977:

Our breaking change policy says that it's not breaking to change behavior of code that throws an Error. It would also say that changing unspecified behavior is not considered breaking.

So you really shouldn't depend on catching and handling Errors. I think this is in support of the idea that the debug mode perhaps by default should just turn Errors into uncatchable things (e.g. a trap in Wasm).

rakudrama commented 5 hours ago

From #56655 (comment):

Our breaking change policy says that it's not breaking to change behavior of code that throws an Error. It would also say that changing unspecified behavior is not considered breaking.

So you really shouldn't depend on catching and handling Errors. I think this is in support of the idea that the debug mode perhaps by default should just turn Errors into uncatchable things (e.g. a trap in Wasm).

I am wary of special debugging modes.

Apps need the ability to report back to the server crashes of every kind. This is how some problems are found and debugged. Sometimes developers need to resort to interplanetary print-debugging, by pushing fresh builds that contain more logging until someone can understand the problem.

How actionable are wasm traps? Can the complete stack be deobfuscated at the server?