dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.03k stars 1.55k forks source link

Going from object to raw runtime type / performance of runtime type stringification #48901

Open mkustermann opened 2 years ago

mkustermann commented 2 years ago

We have some customers who take advantage of the string representation of runtime types. On the Dart VM this will perform a runtime call and is therefore quite slow - even in simple (i.e. non-generic-class, non-internal-class) cases.

Especially the widely used package:built_value is relying on <obj>.runtimeType.toString() for serialization purposes. It uses it mainly for mapping generic types back to their raw types (e.g. Foo<X> -> Foo) - since a serializer is associated with a raw type (irrespective of type arguments used).

API for raw types

One could think about introducing an API that would allow one to obtain the raw type of an object (e.g. Object.rawRuntimeType or Object.runtimeType.rawType - the former being faster in many cases). This would be advantageous for performance due to avoiding the need to go via intermediate stringification. @eernstg @leafpetersen @lrhn Do you think we could make this happen?

Given flutter also uses runtime types in the framework code, I wonder if there's a use case there as well for this. /cc @Hixie

Cache stringification in Dart

Though we do offer stringification of runtime types and they may also be used for wire representations.package:built_value could add caching of the string of runtime types in a map. Though one wouldn't get a guarantee that also 3rd party user code uses such cache. Even if we moved the cache to the core platform - any newly spawned isolate would need to re-populate such a cache.

Cache stringification in VM

Such cache could live in VM itself and be made available to all isolates.

Though an alternative would be that the VM adds a slot to our AbstractType that represents such cache. It would be lazily created on first stringification and be ready for subsequent uses. Advantage would be avoiding the need via map as well as sharing of such cache across isolates. We would pay with a slot plus the memory of the strings - though many such strings would be already present as symbols, making this possibly less of a concern. => There's a question of whether <obj>.runtimeTypes are being canonicalized (in which case they live forever - and this approach making more sense) or not => We may want to do an analysis of big apps to see how much type objects there are at runtime and thereby estimate the possible memory cost

(In an ideal world applications wouldn't be relying on the runtime types at all but in some cases they seem to be too convenient or have no easy alternative)

/cc @mraleph @davidmorgan

eernstg commented 2 years ago

... Object.rawRuntimeType ...

We have had proposals for an instance member runtimeClass which would be the same as runtimeType for an instance of a non-generic class, but it would be a canonical representation of the class (perhaps the instantiation to bounds, or a super-bounded type where every type argument is dynamic) for instances of generic classes. Sounds like rawRuntimeType could be a very similar idea.

We might want to have something other than an instance member. For instance, it could be a static member of Object, such that we don't have to think about when/how/if it is appropriate to override it. We could also simply make it a compile-time error to do so, just like Enum.index.

It should be rather easy to manage a caching scheme for the toString() of any value returned by runtimeClass, because any program contains a fixed set of classes, and we just need to cache one string per class.

davidmorgan commented 2 years ago

Thanks for the detailed writeup Martin--just two notes from me

mkustermann commented 2 years ago

/cc @rakudrama for dart2js perspective

lrhn commented 2 years ago

I'm definitely against adding instance members to Object (and I want to remove runtimeType if possible).

Static members are better:

Type.of(object) => object.runtimeType; // And deprecate Object.runtimeType!
Type.classOf(object) => 
  - if object's runtimeType is a generic interface type, then that type instantiated to bounds 
    (what you get if you write it as a raw type literal).
  - otherwise, the runtime type of the object (which is a non-generic interface type or a function type).

They could be factory constructors, but there is no reason for that, so just static members on Type.

That does mean that objects can no longer lie on their runtime type.

Could also add and instance member on Type:

/// The "raw" type corresponding to this type.
///
/// If [this] type object represents a generic interface type,
/// the raw type is the `Type` object representing the same interface
/// type, only instantiated to its bounds.
/// That's the type you get if you write the raw class name as a type literal.
///
/// If this type is not a generic interface type, its raw value is itself.
Type get raw;

@eernstg Would "views" be treated like interface types, and have a raw variant?

eernstg commented 2 years ago

Would "views" be treated like interface types, and have a raw variant?

When an object is accessed under a view type, we will execute code where the actual type arguments are in scope. This means that view types can not be super-bounded (that would be a soundness violation). So view types can be raw, and the meaning would be defined by instantiation to bounds, but if the given type isn't regular-bounded then it is a compile-time error.

lrhn commented 2 years ago

I guess (ViewType<ValidArgument>).raw would throw in the case where ViewType does not have a valid instantiate-to-bounds. Can that happen for class types too, or will it always be super-bounded?

eernstg commented 2 years ago

It is possible to create a class type that does not have an instantiation to bounds. (But it would always be possible to use C<dynamic, dynamic, ...> when C is a class where all type parameters are covariant, that's just not the result which is obtained by running the i2b algorithm, and the result from i2b is not always well-bounded.)

typedef A<X> = X Function(X);
class C<X extends A<X>> {}
C c = C<Never>(); // Error, `C` does not have an i2b.
lrhn commented 2 years ago

Would allowing .raw (with the specified behavior, or something close) be a big cost for AoT compilers? We need some amount of runtime type information in order to go from a Type object for A<T> to one for A (instantiated to bounds). That information might just be one pre-allocated Type object for each generic class, mixin or view, if we already have a way to go from the type for A<T> to the class/mixin/view A.

@rakudrama again for a dart2js perspective on this too!

mkustermann commented 2 years ago

Would allowing .raw (with the specified behavior, or something close) be a big cost for AoT compilers?

Probably not for the VM.

What would .raw do on a Type representing a function type?

lrhn commented 2 years ago

Doing .raw on any type which is not an instantiation of a generic class or mixin (or view, eventually) will do nothing, and return the original type. It's as raw as it gets.

eernstg commented 2 years ago

We could actually define .raw or Type.classOf(_) to return Function when it encounters a function type. I suspect that developers who are using this feature would often only want to call it in order to obtain an identification of a user-visible class, and if a function type contains C<T> and no other operation in the program needs the value of T then this could make the difference between being able to erase the type argument of C and not erasing it.

(This might work particularly well if we introduce declaration-site variance, such that we will have fewer situations where such actual type arguments are needed at run time, but it could also be useful on web platforms where such checks can be omitted even in cases where it is not sound).

Hixie commented 2 years ago

For Flutter I believe we only use type stringification for debugging purposes and not in production.

rakudrama commented 2 years ago

@davidmorgan What are the properties of the minified x.runtimeType.toString() that are useful to you?

dart2js has a unique identifier for each type. lib1.MyClass and lib2.MyClass have the names MyClass and MyClass0, or some different shorter inscrutable minified names, with the tag minified: to allow tools to look up the original name from the debug info (source-map). These names can change each time you edit anything in the program or SDK.

Disallowing overrides of Object.runtimeType

I don't think it will be easy to get rid of Object.runtimeType and its overloads. 4000000000.runtimeType would depend on the platform. _Smi: 64-bit VM with 64-bit pointers _Mint: 32-bit VM and 64-bit VM with compact pointers JSInt: dart2js

The lie that all of these have a runtimeType that is the interface type int is less confusing when printed.

static Type.classOf vs instance Type.raw

I prefer the static method Type.classOf to the instance method Type.raw because it is more direct. Even at the surface level, it is one call in the user's program instead of two.

On dart2js, every type used for instance types, type parameters and type tests is represented by an Rti object.

The path for instance.runtimeType is already quite tortuous.

  1. Get the Rti object for the instance. In the general case this is slow because the Rti is stored in many places; sometimes directly on the object,  sometimes stored on the JavaScript constructor, but in the full generality, it is also stored on an interceptor (method table), and for functions it has to be computed by a $signature method.

  2. Canonicalize the Rti to a potentially new Rti with all the 'star' types removed. The star-free Rti is cached on the original Rti,

  3. Create a Type instance that wraps the star-free Rti. This is cached on the star-free Rti.

  4. Calling toString() walks the Rti tree to construct the string. It would be reasonable to cache the string on the Type object. Type objects are not really const in the dart2js runtime, so they can have a mutable field.

If we add a .raw getter, we would need to cache that too on the Type object.

I would expect that Type.classOf would build a cache from the JavaScript constructor to the Type instance that would be faster, in the general case, than getting the Rti, step (1) above.

Today dart2js has no table of instantiate-to-bounds results indexed by class. Where instantiate-to-bounds happens, the type is filled in by some part of the compilation chain. A table of instantiate-to-bounds for all generic classes would be the main code size cost of either Type.classOf(e) or e.runtimeType.raw. This could be compressed by having a default instantiation to Object?, but that is probably not worthwhile since then we would need a separate table for the number of type arguments.

The main implementation effort of the table would be in partitioning the table so that it is loaded incrementally with deferred loading.

davidmorgan commented 2 years ago

@rakudrama you can see how it's used here

https://github.com/google/built_value.dart/blob/master/built_value/lib/src/built_json_serializers.dart#L197

we'd like to look up a serializer by Type, but failing that we get the type name, strip off everything after and including <, and look up using that. So what we rely on is the base name uniquely identifying a type and the name if there is generics starting with the raw name then <. They only have to stay the same for the current program execution, they're not stored.

Thanks.