call_indirect and function subtyping

manoskouk commented 2 years ago

It is my understanding that function subtyping forces call_indirect to perform a runtime subtyping check. Can we confirm this? On a related note, this check might impact MVP modules: consider an MVP module defining the type t1 = int32 -> int32 and using it as immediate signature for call_indirect. Despite this type not being nontrivially extensible, call_indirect still has to perform a runtime check: another module might always later define t1' = int32 -> int32 and t2 = int32 -> int32 <: t1'; t1, t1' canonicalize to the same type, therefore t2 <: t1. This should not impact running time of MVP modules (since we can include a fast path that checks for signature equality); however, it will impact binary code size. It would be unfortunate if wasm-gc ends up having impact on existing modules. Would it be reasonable to somehow consider MVP types final by default?

rossberg commented 2 years ago

So far, the proposal errs on the side conservatism and does not change the semantics of call_indirect, i.e., it still requires equivalent types. At last week's meeting, there seems to have been agreement in principle that we'd rather want to allow subtyping there, but the impact on performance is not sufficiently clear yet.

The fact that the engine can't locally know whether a type has subtypes is one of the reasons for the concern. I mentioned that a notion of final could solve that, but the proposal does not currently have that, so would require extending it.

But I think we should experiment and measure the actual overhead first, otherwise it'd be premature optimisation.

manoskouk commented 2 years ago

Thanks for the reply. I am worried though that the conservative approach breaks subsumption in the presence of table.set.

rossberg commented 2 years ago

Technically it doesn't, since runtime typing is separate from static typing. This would be clearer if RTTs weren't swept under the rug. ;) The type check for call_indirect is essentially a cast, and as I have noted in other contexts, there is no reason to expect that runtime casts can always reverse subsumption. The only guarantee you have is that if a dynamic subtype check succeeds, then the corresponding static subtyping relation holds; the inverse is not true.

In your example, you have t2 <: t1 <: func, and a function with runtime type t2 but static type func that you are trying to cast down to t1. For practical and philosophical reasons, I would much prefer if this succeeded, but it's not required for any theoretical or technical reason.

titzer commented 2 years ago

I think we should be complete and make call_indirect do a proper subtyping check. From what I remember of doing the V8 implementation, there are enough instructions generated for the bounds check and signature check that I considered pulling it out into a per-signature stub, in which case the additional slowpath checks for the subtype check would be less of an issue.

eqrion commented 2 years ago

A proper subtyping check in call_indirect would likely be reasonable for SpiderMonkey to implement. We have the check on the callee side (so binary bloat wouldn't be too bad), and could fast-path the case where signatures are equal.

Regarding final and MVP function types, a mild suggestion could be to implicitly make function-types where all params and results are number-types final, because we'll never add a subtyping relation to those val-types, and so any declared subtyping would seem to be pointless. Then for those function-types, type identity would imply subtyping. I don't know how critical this is though.

rossberg commented 2 years ago

Mh, if it comes to that, I'd much rather have all function types default to final uniformly than introducing irregular rules.

manoskouk commented 2 years ago

@rossberg thanks for the explanation.

Another motivation for having subtypes succeed is that we can drop the type check altogether if the immediate type coincides with the table's type.

RossTate commented 1 year ago

This seems like it would have a very inefficient slow path, and it might even noticeably slow down the fast path. It's also not clear how it would extend to programs using Post-MVP features (where the subtyping between, e.g. existential types representing objects, has to be explicitly proven by the producer rather than automatically determined by the engine).

I have two alternate solutions, one short-term and one long-term.

The short-term solution is to have a variant of ref.func wherein one specifies a list of signatures the indirect call's signature should match exactly with. This covers the whole-program scenarios the GC MVP is targeting.

The long-term solution is to use call tags with fallback behavior. This covers separate-compilation scenarios.

rossberg commented 1 year ago

@RossTate, as mentioned various times before, adding new constructs, as useful as they might be, does not address the question at hand, which is concerned with the semantics of an existing construct and its interaction with subtyping.

Also, if we allow subtyping for call_indirect, then it would effectively become an exact macro for table.get + ref.cast + call_ref. So whatever post-MVP questions might arise wouldn't be specific to this instruction – to the contrary, such a uniform semantics would avoid the need for specialised answers.

titzer commented 1 year ago

I agree with @rossberg; without function subtyping, call_indirect has special, surprising semantics.

It's worth noting that the "slowpath" of checking for function subtyping would otherwise be a trap. The fast path remains an identity check, and thus should not slow down any program that exists today, nor any program that doesn't rely on function subtyping.

RossTate commented 1 year ago

If you want it to work with subtyping, then solve the problem the same way you did for structs: use isorecursive types and implement function subtyping according to the explicit hierarchy rather than component-wise. This has the benefits of being more consistent with the rest of the GC proposal's static sub typing and dynamic casting designs, of supporting recursive function types, and of being efficiently implementable even on the slow path.

jakobkummerow commented 1 year ago

I didn't think there was any disagreement that function subtyping for call_indirect, if we allow it, would use the same explicit subtyping mechanism as the rest of the GC proposal...?

RossTate commented 1 year ago

Ah, then I misunderstood some of the earlier comments. Sorry.

Then to address this issue:

On a related note, this check might impact MVP modules: consider an MVP module defining the type t1 = int32 -> int32 and using it as immediate signature for call_indirect.

I would recommend that types not in recursion groups be final.

askeksa-google commented 1 year ago

I would recommend that types not in recursion groups be final.

Are you recommending that as a difference between types not in recursion groups and types in singleton recursion groups, as per the discussion in https://github.com/WebAssembly/gc/issues/334?

It's certainly useful for types that are not recursive to be used as supertypes. Would this suggestion then mean that in order to be used as supertypes, such types would have to be places in recursion groups? Wouldn't that conflict with other potential distinctions such as the ones @rossberg mentioned in https://github.com/WebAssembly/gc/issues/334#issuecomment-1309058991?

rossberg commented 1 year ago

I would recommend that types not in recursion groups be final.

That strikes me as completely artificial conflation of concerns. If we wanted final, then we should be introducing it as an explicit feature, as discussed before. The use case for it is wholly unrelated to whether a type is recursive. Also, we already concluded that we need actual performance data before considering final.

RossTate commented 1 year ago

Are you recommending that as a difference between types not in recursion groups and types in singleton recursion groups, as per the discussion in https://github.com/WebAssembly/gc/issues/334?

Yes. Or you could think of a type not in a recursion group as the same thing as a type in singleton recursion group and labelled final.

That strikes me as completely artificial conflation of concerns.

The concern expressed in the OP is that having existing function types not be final will incur costs for programs not using function subtyping. We know it will at least cost memory; it might also cost run-time performance. Having existing function types be final addresses that concern. Another way to think about it is that types are final by default and it is extensibility that needs to be explicit. This is how it works in some systems (e.g. Kotlin), and applying that to the MVP for struct types (i.e. struct types even in a recursion group with no subtypes are not extensible) would enable better casts for MVP GC as well (especially since MVP is focusing on whole-program compilation).

rossberg commented 1 year ago

If we decide to introduce final then we can also make it the default, but that's still completely unrelated to type recursion.

RossTate commented 1 year ago

Come to think of it, for the MVP, the sub clauses could be restricted to types declared in the current group. Then these groups can be thought of as defining subtyping hierarchies (that happen to permit recursion). (As a Post-MVP feature, you could declare some types as "extensible", in which case they can be referred to by sub clauses, but that seems to be only helpful for separate compilation and so not necessary for the MVP.) That would address the issues here, enable more optimizations for the MVP, and still offer a nice interpretation of what these groups are.

rossberg commented 1 year ago

That still conflates unrelated issues and obviously would break even the simplest form of separate compilation that we aimed to support.

conrad-watt commented 1 year ago

At this point I can no longer tell if you're just trolling.

I think we can express our opinions that a comment is unproductive or unactionable in a way that avoids this kind of language. @rossberg would you be ok with editing out this sentence?

RossTate commented 1 year ago

I honestly feel like I offered a suggestion that aligns well with how the MVP is currently being used, addresses the concerns multiple people raised here, addresses the concerns you and others raised about an earlier suggestion, and provides a clear extension for separate compilation (which could be integrated now if the group chooses). That seems like a productive thing to do, so I am at a loss as to why I am being attacked for it.

titzer commented 1 year ago

Forcing supertypes to be in the same recursion group conflicts with minimizing the size of recursion groups, which is one of the enablers for separate compilation. I don't understand the motivation for the restriction nor what benefit it would give. It just seems arbitrary, and highly likely something we'd labor to relax post-MVP.

edit to add: In general, given that we've reached Phase 3, I think tweaks to the MVP need to be strongly motivated by an important use case encountered in a real language targeting the existing MVP, or real performance data from a production engine implementing the MVP.

tlively commented 1 year ago

The J2CL and Dart teams have indicated that they may need to start looking at inter-module coordination sooner rather than later, so we (Google) don't want to place any restrictions on recursion groups that would hinder that use case.

RossTate commented 1 year ago

Forcing supertypes to be in the same recursion group conflicts with minimizing the size of recursion groups, which is one of the enablers for separate compilation.

As I mentioned, a sub can mention any type in the same group or any type marked as "extensible". So you can still do the same minimization that's currently possible; you just have to explicitly mark the relevant types as extensible.

I don't understand the motivation for the restriction nor what benefit it would give.

Every cast to a non-extensible type with no subtypes in the group would be guaranteed to be exact. That would let call_indirect be implemented for all core-wasm function types just as it is now. That would also let casts to such "final" types be optimized.

In general, given that we've reached Phase 3, I think tweaks to the MVP need to be strongly motivated by an important use case encountered in a real language targeting the existing MVP, or real performance data from a production engine implementing the MVP.

This thread is about making a change to call_indirect. This change is not necessary to support the use cases. When a subclass overrides a superclass with a broader signature, an entry for the broader signature can be added to the v-table, and the entry for the superclass's method can be filled with a stub with the narrower signature that simply (tail) calls the implementation with the broader signature.

So if you are going to apply this reasoning about Phase 3, then you should be demanding that people demonstrate this causes substantial performance problems before tweaking the MVP to support a different implementation of the pattern. (I am not saying that should be necessary; just pointing out the inconsistent application of this principle.)

rossberg commented 1 year ago

@conrad-watt:

@rossberg would you be ok with editing out this sentence?

Let me sleep over that. ;)

titzer commented 1 year ago

So if you are going to apply this reasoning about Phase 3, then you should be demanding that people demonstrate this causes substantial performance problems before tweaking the MVP to support a different implementation of the pattern. (I am not saying that should be necessary; just pointing out the inconsistent application of this principle.)

No, MVP is the null hypothesis. The burden of proof is on proposed changes to MVP. I am increasingly uncomfortable with how this conversation is starting to repeat patterns we've managed to break out of, particularly with somewhat vague speculations. We're fully in empirical mode, and that's gotten us lots of progress.

RossTate commented 1 year ago

The burden of proof is on proposed changes to MVP.

Then can you point me to the empirical evidence you have suggesting that the MVP needs to be changed to support sub typing in call_indirect, rather than employing the technique I provided above? If so, that would be enlightening.

titzer commented 1 year ago

AFAICT it was always the intention that call_indirect would support subtyping and not including it was an oversight. So we are actually finishing the MVP design, rather than departing from it.

RossTate commented 1 year ago

The reason that wasn't done before is because there were concerns about performance. As such, I'm offering a change to type validation that ensures core wasm performs as before. (And a small extension to support separate compilation where the "later" units can add more types to the hierarchy.)

A fine response is "It is useful to have that option on the table. We will evaluate the impact of not using that option has on core wasm modules, in terms of both memory and time, and present the results to the CG. Should they be dissatisfied, then this might be an option we build upon. But at present we would rather gamble on the changes being unnecessary."

manoskouk commented 1 year ago

I ran some benchmarks to see the code-size impact of call_indirect subtyping on linear-wasm modules. I found it to be anything from 0.17% up to 10%. I think this is unacceptable and we should lean towards interpreting current signature types as final.

askeksa-google commented 1 year ago

Given how prominently casting overhead has featured in our performance discussions, it makes sense to consider every little thing that could potentially improve that situation. Final types could be such a thing, even if it only cuts away code that would never be executed anyway (meaning it could help on compilation speed, runtime memory usage and instruction cache pressure).

With just three different kinds of deftypes (so far), it seems natural to assign separate byte values for the final and extensible versions of each. From a history-less point of view on the spec there would thus be no concept of which one is "default". But as @RossTate and @manoskouk suggest, it would make sense - at least if we want subtyping capability for call_indirect - to make the current encoding of a function type definition (0x60) take the role of "final function type definition" in the new spec that allows both final and extensible types.

If we decide to make all types extensible (as currently specified), we can't later go back and make them final. This is an argument for considering final types for the MVP.

I strongly agree with @rossberg that final vs. extensible should be specified completely independently of rec groups. These are orthogonal concepts.

RossTate commented 1 year ago

That makes sense. Essentially the current code for function types is specifically final function types, whereas the current code for structure types is specifically extensible structure types.

askeksa-google commented 1 year ago

That makes sense. Essentially the current code for function types is specifically final function types, whereas the current code for structure types is specifically extensible structure types.

Hmm, it seems to me you are conflating two different uses of "current" here.

The code for function types in the currently finalized version of Wasm (before GC) becomes "final function type". Which codes we choose for extensible function type and final/extensible struct/array type doesn't matter, as we have nothing to be compatible with on that front.

It may be a tricky transition to go from the current WasmGC where 0x60 means "extensible function type" to one where it means "final function type", but that should not hold us back.

RossTate commented 1 year ago

I think we're on the same page. The important thing is that this discussion now seems to be focusing on how to address the issue in the OP, rather than why to not bother, and y'all are exploring some good pay-as-you-go solutions. Thanks, @manoskouk, for validating the concern!

conrad-watt commented 1 year ago

With respect @RossTate, I hope you can take some lessons from this thread to improve your interactions with the group in future. Your comments generated a lot of noise in a situation where there was no disagreement from the outset that:

we can add a concept of final to the GC proposal
whether or not we should will depend on empirical considerations such as those brought up by @manoskouk

dcodeIO commented 1 year ago

Objection: Double standards, reversal of blame

titzer commented 1 year ago

For the record, I was expecting that the plan of record was to 1.) add function subtyping to call_indirect and then 2.) gather feedback on the performance/size metrics from real engines. I assume we'll be following that with 3.) investigate alternative implementations to mitigate overhead and 4.) explore design options to mitigate overhead.

Thanks @manoskouk for gathering the data in step 2. AFAICT the empirical evaluation has advanced the discussion and I appreciate it!

tlively commented 1 year ago

@manoskouk, do you have further work planned to try to bring the cost of subtying in call_indirect down, or will solving this require a spec change?

manoskouk commented 1 year ago

I would be very happy with any ideas on how to reduce the cost. Currently, the only idea that has come to my attention is to compile call_indirect for wasm-linear signature types as though they are final, and invalidate the generated code if such type happens to be extended. I think this is an overly complex solution and I don't see us going down that path anytime soon.

titzer commented 1 year ago

Two ideas: 1.) move the entire call_indirect sequence out-of-line to a per-signature shared stub[1]. 2.) move the exact-match-failed slowpath (i.e. subtyping) out-of-line (perhaps into a shared stub).

[1] I wanted to do this in V8 for years.

conrad-watt commented 1 year ago

@manoskouk thanks again for your experiments! I agree that solution seems over-complicated. Do you have a sense of what kind of code is associated with the 10% executable binary size overhead? Was this a deliberately-engineered pathological program, or something from an existing codebase? I'd be particularly interested in knowing how bad the situation is for compiled C++ programs with lots of virtual function calls.

I'm sure others can offer their opinions on this, but if the "representative" overhead were closer to 0.2% than 10% I'd feel more comfortable with delaying a "final" extension to the type system. That being said if everyone is willing to put the effort in upfront, it's probably fine to add "final" now and make it the default.

RossTate commented 1 year ago

@conrad-watt Your comment is completely inappropriate, as well as blatantly unnecessary (as I was leaving the discussion). This comment and the one preceding it made it sound like the reason y'all were planning to advance a change that would hinder core-wasm programs rather than just making functions final was because y'all were missing alternate ways to support use cases demanding funcrefs that could accept multiple signatures. So I provided some. One I knew performs well from related works. Another required no changes to the binary encoding, just to type validation. I legitimately thought these would be helpful options because they would resolve what I thought was a perceived conflict between two design concerns.

I would like to leave this discussion. Please do not use a public forum to make more condescending attacks disguised "with respect" while not making any effort to understand my perspective. You could have reached out to me privately, either to understand why I did what I did, or to offer what you maybe legitimately believed were valuable lessons for me to learn.

Everyone, I request that the remainder of the discussion proceed as if neither this comment nor the comment it responds to were made. I sincerely meant to advance the technical options available to y'all, and I regret making that effort.

conrad-watt commented 1 year ago

@RossTate if you, or others, have concerns about how this discussion is being moderated, please directly contact the chairs or the W3C ombuds.

manoskouk commented 1 year ago

@conrad-watt These were existing programs that we use as performance benchmarks for V8. Two of them are real-world programs for which I got 7.5% and 10% for the optimized tier. Regarding additional effort, unless I'm missing something, the only obligatory change is a check during subtype validation, and the rest is optional optimizations. @titzer Thanks for the feedback. We could certainly try that, but I am worried that the runtime impact on wasm-gc programs with call_indirect will be significant.

askeksa-google commented 1 year ago

Regarding additional effort, unless I'm missing something, the only obligatory change is a check during subtype validation, and the rest is optional optimizations.

IIUC, the iso-recursive type canonicalization would also need to distinguish between final and extensible types.

titzer commented 1 year ago

Yes, canonicalization has to distinguish finality, otherwise a final definition could get canonicalized with non-final, and thus later be used as a supertype.

rossberg commented 1 year ago

@manoskouk:

I ran some benchmarks to see the code-size impact of call_indirect subtyping on linear-wasm modules. I found it to be anything from 0.17% up to 10%.

10% sounds like a lot, I have difficulties explaining a number like that. Can you provide some statistics on the frequency of call_indirect in that code, and background on what code sequence you needed to add?

If a subtype check requires that much code, then I would imagine that other casts lead to a lot of code blow-up, too.

manoskouk commented 1 year ago

In this benchmark, call_indirect amounts for about 0.85% of instructions. You are right that due to cacheable code and other technical reasons specific to call_indirect, this code is quite bloated in V8. I would appreciate other engines reporting their own numbers. However the question is, how much overhead is acceptable? Say we drop it to 2%, would that be preferable over the spec change?

titzer commented 1 year ago

@manoskouk Have you measured any execution time overhead? IIUC the above reported results were code size only.

rossberg commented 1 year ago

Good question. I'd say that depends more on the average than the worst case. 2% as worst case is probably acceptable, as long as the common case is only a fraction of that.

WebAssembly / gc

call_indirect and function subtyping #329