dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.06k stars 1.56k forks source link

`const Symbol()` constants are not turned into symbol constants by kernel2kernel constants transformation #34911

Open sjindel-google opened 5 years ago

sjindel-google commented 5 years ago

E.g.:

void main() {
  print(const Symbol("x"));
}

Results in this kernel (I modified ast_to_text.dart to show constant types):

main = test::main;
library from "file:///usr/local/google/home/sjindel/src/dart-sdk/sdk/test.dart" as test {
  static method main() → dynamic {
    core::print(#C2);
  }
}
// ...
constants  {
[StringConstant]:   #C1 = x
[InstanceConstant]:   #C2 = dart._internal::Symbol {dart._internal::_name: #C1, }
// ...
mkustermann commented 5 years ago

The test

import 'package:expect/expect.dart';

main() {
   Expect.isTrue(identical(const Symbol('x'), #x));
}

passes just fine. So the behavior of our AOT compiler & runtime seems correct!

sjindel-google commented 5 years ago

This matters for obfuscation, that test will not pass in obfuscated mode. Also we need this do perform library name mangling when the argument to "const Symbol()" is a private identifier (#34904).

lrhn commented 5 years ago

I don't know what "obfuscatated mode" is, but if it is anything like minification, then it should be able to work.

You must not do name mangling in the const Symbol constructor, even if we allow symbols starting with underscores. The constructor is just a normal constructor creating an object containing its argument.

Would it be better to move the specification of const Symbol(constantString) into the language specification to ensure that it is handled correctly?

The current constructor docs say:

Constructs a new Symbol representing the provided name.

The name must be a valid public Dart member name, public constructor name, or library name, optionally qualified.

A qualified name is a valid name preceded by a public identifier name and a '.', e.g., foo.bar.baz= is a qualified version of baz=. That means that the content of the name String must be either a valid public Dart identifier (that is, an identifier not starting with "_"), such an identifier followed by "=" (a setter name), the name of a declarable operator (one of "+", "-", "*", "/", "%", "\~/", "&", "|", "^", "\~", "<<", ">>", "<", "<=", ">", ">=", "==", "[]", "[]=", or "unary-"), any of the above preceded by any number of qualifiers, where a qualifier is a non-private identifier followed by '.', or the empty string (the default name of a library with no library name declaration). Symbol instances created from the same name are equal, but not necessarily identical, but symbols created as compile-time constants are canonicalized, as all other constant object creations.

assert(new Symbol("foo") == new Symbol("foo"));
assert(identical(const Symbol("foo"), const Symbol("foo")));

If name is a single identifier that does not start with an underscore, or it is a qualified identifier, or it is an operator name different from unary-, then the result of const Symbol(name) is the same instance that the symbol literal created by prefixing # to the content of name would evaluate to.

assert(new Symbol("foo") == #foo);
assert(new Symbol("[]=") == #[]=);
assert(new Symbol("foo.bar") == #foo.bar);
assert(identical(const Symbol("foo"), #foo));
assert(identical(const Symbol("[]="), #[]=));
assert(identical(const Symbol("foo.bar"), #foo.bar));

This constructor cannot create a Symbol instance that is equal to a private symbol literal like #_foo.

const Symbol("_foo") // Invalid

So Symbol("foo._bar") is disallowed because we disallow private identifiers everywhere, not only at the beginning. The VM currently accepts it.

sjindel-google commented 5 years ago

Mangling is not the same as obfuscation. Symbol mangling refers to modifying private symbols to be inaccessible from other libraries. Obfuscation refers to changing the entire symbol (mangled or not) consistently across the program to a randomized name.

This issue is about obfuscating symbols created via const Symbol() constructor. Issue #34904 is about mangling.

The problem is that currently we don't obfuscate symbols created via const Symbol() constructor. This means that tests like Martin's example above break in obfuscated mode.

mkustermann commented 5 years ago

The constructor is just a normal constructor creating an object containing its argument.

@lrhn That is precisely the current implementation. So I assume the following test needs to always pass?

import 'package:expect/expect.dart';
main() {
   Expect.isTrue(const Symbol('x') == new Symbol('x'));
}
sjindel-google commented 5 years ago

In obfuscated mode we cannot have both that test and the original test pass.

lrhn commented 5 years ago

Yes, the == operator on Symbol is defined such that new Symbol('x') == new Symbol('x'). As for all other constant classes, const Symbol('x') is just a particular new Symbol('x') that all similar expressions have been canonicalized to. It's not a different kind of object. There does not need to be any magic in the const constructor (apart from the extra validation of the input).

Or, more preciely:

The next question is what obfuscation promises. Is it semantics preserving or not?

If obfuscation is a source transformation, it may not promise anything. It creates a completely new program with potentially different behavior if the original program relied on source names. It's no different from a clever search and replace of source names, with no effect on strings. Afterwards, #foo might be #_xQ2, but Symbol("foo") is still Symbol("foo"). If the program uses source names in strings, those are (obviously) not obfuscated anywhere else either, unless the obfuscation is clever enough to recognize ArgumentError.notNull("sourceString") as having a source name as argument.

If obfuscation is intended as a semantics preserving transformation, then you need to retain a source-name to symbol table. That obviously ruins the obfuscation. Maybe you can keep some obfuscation by only keeping a hash of the source name in the table. That way you can't just reverse the obfuscation, but you can validate a guess at a name.

Obfuscation will also break code that depends on the toString of Type or Object or Symbol. That's just inevitable, and minimized code has lived with that so far.

sjindel-google commented 5 years ago

Right, there are clearly limitations in obfuscation. The goal is that is preserves semantics as much as possible, with preference toward language features that are more popular (so more programs will run correctly with obfuscation enabled).

The question remains though, which of the following principles are better to break: identical(#foo, const Symbol("foo")) or const Symbol("foo") == new Symbol("foo")

lrhn commented 5 years ago

I can argue philosophical purist principles all day, but in practice, I don't think new Symbol is used that much. You will make more programs work correctly if you obfuscate all constant symbols, and give up on the dynamically created ones. It's an easy rule to remember for users too: Never use obfuscation if your code uses new Symbol.

Obfuscation is necessarily a whole-program transformation (that, or it's quite fragile), so you should be able to report it if an obfuscated program contains any non-constant Symbol constructor invocations. (Or, if the non-constant invocations have constant arguments, you can fix them anyway).

sjindel-google commented 5 years ago

Thanks for the advice, that makes sense to me. In this case we need to proceed with transforming const Symbol() invocations into Kernel symbol constants in the kernel2kernel constant transformation.