Closed RossTate closed 4 years ago
I would like to understand more about how dispatch tags would be used.
In particular, you cannot use dispatch_tag.get on imported types. For that pattern, you should instead import the appropriate dispatch tags you need, enabling call_indirect to work across type imports in a way that respects abstraction (and subtyping because import/export signatures of dispatch tags do not have to match exactly—they can be subtypes and everything works just fine).
If I understand correctly, a module providing a function would export both the function and its dispatch tag. A module using the same function would import both the function and its dispatch tag, then supply the tag when (indirectly) calling the function. From that high-level understanding, it doesn't sound like the dispatch tag is adding anything that couldn't just be bundled with the function itself. Can you explain the nuance there?
Also, as far as I can tell it is impossible in this scheme to export a function that uses imported types in its signature and allow it to be indirectly called in another module because the module defining the function cannot create a dispatch tag for it. Is that correct? If so, that seems like a strange usability and composability wart.
Great questions.
First, I forgot to describe a step. When you import a func
, the auto-generated dispatch_func
for it (if any) is generate according to the imported signature rather than the func
's declared signature. That is, your auto-generated funcref
for that import as indirectly callable according to your own expectations. This avoids the subtyping issues on exported/imported functions in #1343. It addresses the (current) common pattern where call_indirect
is just used to indirectly call to one's own func
s (including one's imported func
s). That is, currently one's funcref
s are almost always generated from one's own func
s and imported func
s.
This means you don't need to import dispatch tags to indirectly call imported func
s, nor do you need to export dispatch tags for others to be able indirectly call exported func
s. You only need to import/export dispatch tags if you want to import/export funcref
s themselves and have indirect calls on them work successfully. For example, you might be a module for an object-oriented language that exports a class C
but not the fields of that class. That is, you export the type for C
abstractly except for a constraint that indicates it has a v-table and interface-method table. You might also provide some interface that has a method foo(C) : int
. You'll have some dispatch tag $dt_foo
for this method, whose signature is defined in terms of the concrete type for C
that you know. But when you export this dispatch tag, you'll export it with just the abstract type for C
. This means that everyone defining their own implementations of that interface—or indirectly calling that method through an object's interface-method table—will agree on the specific dispatch tag while not knowing the concrete type of C
(say by just using C
s exported class methods). At the same time, your own implementations of that method can use C
s concrete type to access C
's private fields and the like.
Hope that does a better job of illustrating the cross-module usage patterns I foresee.
But when you export this dispatch tag, you'll export it with just the abstract type for
C
.
Okay, I followed you up to here. If I export $dt_foo
, I don't expect a different tag to export, I expect that tag to export. So where does this other, restricted tag come from?
I thought I might need to clarify that. Whenever you export something, you specify what type to export it with. So if you're module's export signature is type Handle; dispatch_tag $dt_get_value : [Handle] -> [i32]
, your module can instantiate Handle
with i32
and instantiate $dt_get_value
with some dispatch tag that has associated signature [i32] -> [i32]
. That is, with type imports (and subtyping), a given dispatch tag might have multiple different signatures it can be exported as, and the choice of signature affects other modules' perspective of that dispatch tag, e.g. as something for operating on abstract Handle
s or for operating on concrete integers. The design above ensures that other modules use the dispatch tag according to the perspective consistent with its exported signature, regardless of what its internal "true" signature is.
Okay, so this is... nominal equality of function types?
Unfortunately I'm not sure what you're asking.
@taralx, I was hoping you would rephrase your question :) Sorry, I didn't mean my comment to disregard it.
In the meantime, I thought of another perspective that might help. Suppose I am a security-conscious application. I want to be very conscious about all the entry points for my program. call_indirect
is an opportunity for an indirect entry point. I can try to mitigate that by being careful about the function references that escape my program, but leaks happen, and it would be better to just not have that entry point to begin with.
One option is to just not have any dispatch tags associated with any of my functions. However, that means even I can't call_indirect
my own functions, which would be a huge problem for many C/C++ programs. So this proposal provides the option to use dispatch_tag.new
for every signature I use and associate each of my functions with the respective new tag. Then, so long as I don't explicitly export those dispatch tags, I know I'm the only one who can call_indirect
into my functions. Furthermore, if I have subcomponents of my program that are supposed to be separable, I can even make new tags at a finer granularity for each subcomponent to ensure that a call_indirect
in one subcomponent cannot call into another subcomponent. Thus, the proposal makes it easier for me to restrict the scope of call_indirect
, both from other modules and even within my own module.
@RossTate I like that way of thinking about it! It's very accessible. On a related note, could you add a note about the motivation behind this proposal to the beginning of the opening post? I'm afraid that anyone who has not been following this broader discussion closely would be missing the context of what problem this proposal aims to solve. I think the issue deserves a wider discussion than we've had so far once @rossberg's ideas on newtype
-style abstraction are written up.
Thanks for the feedback, @tlively! Suggestions on how to make ideas more accessible are always very useful. I rewrite the original post per your suggestion and to better preempt some of the confusion above. Please let me know if it's actually an improvement, if you have the time.
Thanks, @RossTate! I think this is an improvement.
One thing I'm not sold on is the extension that introduces dispatch_func
s. That part of the proposal seems like a much stranger addition to the spec than dispatch dispatch tags and it seems like it is motivated by a particularly narrow interface dispatch use case. Could you share how the interface dispatch mechanism you describe would be implemented without dispatch_func
so we can directly evaluate what dispatch_func
gets us?
Also, the opening post is growing rather long, so it might be easier to discuss it if it were a PR we could make inline comments on. @binji, do you think you could create a phase 0 repo for this proposal? The process document says that proposals get a repo in phase 1, but all the other listed phase 0 proposals have their own repos already. Alternatively we could just move this to phase 1, but it would be prudent to get more eyes on it first.
I could make a dispatch-tags
repo in the SOIL Initiative's account and put a PR into there if that'd be easier for people to provide feedback on.
Could you share how the interface dispatch mechanism you describe would be implemented without dispatch_func so we can directly evaluate what dispatch_func gets us?
So I believe the next-best option these days is still interface tables. With this, each class's v-table has a reference to an "interface table", which in WebAssembly would likely have to be implemented as an array of references to "interface-method tables". (Production implementations often use an internal pointer to an array of internal pointers all into the v-table to get better cache performance.) Each interface specifies its own interface-method-table layout based on the methods it has, and each class implementing an interface generates an instance of that layout with pointers to how it implements the respective methods. The class's interface table then has an entry for each interface implemented by the class.
An interface-method invocation then grabs the interface table from the object's v-table, iterates through the interface-method tables until it finds the one matching the interface that the method being invoked belongs to, and then loads the appropriate code pointer from that interface-method table.
So in the optimal case here you have load v-table > load i-table > (load im-table-type > check for match)+ > load im-table > load code pointer > call
, as opposed to the design using dispatch_func
where you have load v-table > load code pointer > push dispatch tag > call
. Of course, if you have code that you know makes repeated calls to methods of the same interface on the same method, you can remember the object's interface-method table. In this ideal case, the interface-table approach can just slightly outperform the one using dispatch_func
because it has no need to push the dispatch tag, but even that gap can be closed with some host-side optimization (for repeated calls to the same funcref
with the same dispatch tag, the host can remember the intermediate direct function call that the call_indirect
resolves to).
(For a sense of scale, Java's ArrayList
class implements 5 interfaces, and C#'s List<T>
class implements 8 interfaces.)
it seems like it is motivated by a particularly narrow interface dispatch use case
This is just the best-known application of this feature. In our research investigating optimizations for AOT-compiled (Python-like) dynamic OO languages, we found that this can be really beneficial to untyped method dispatch. We give every object a 19-slot method table, and every method name has some slot number according to the name's hash. The method implementation's associated with the object at allocation time are put into this table using dispatch_func
s just like interface-method implementations above, and methods are invoked using call_funcref
just like interface methods above. However, instead of the dispatch_func
trapping when no match is found, instead a more general-purpose handler is called, and this handler looks into the object's dictionary for methods that may have been added afterwards. This application requires another extension, albeit one still built on the same underlying principle, but we found it made a huge difference in performance for at least our research language.
@RossTate I've been pretty focused on call_indirect
's type equality recently. For MVP, equality is very simple because the space of function types is very simple. Your "dispatch tags" seem to be equivalent to naming function types, so the same underlying type can have two names that are not equal as far as call_indirect
is concerned.
@taralx, assuming I'm reading you correctly, I think you have the right intuition.
Thanks for that explanation, @RossTate! It sounds like the extension could be useful, but could also be easily deferred to a follow-on proposal if dispatch tags are standardized. For the sake of focusing on fewer things at once and moving one step at a time, I suggest focusing further discussion with the wider CG on just the dispatch tags without the extension to dispatch functions for now.
Thanks for the feedback, @tlively! Sorry for the hiatus—I was consulting to get some more insight into JVM implementations. Unfortunately, there wasn't much insight to be gained because interface dispatch is only done dynamically on megamorphic calls, and those are infrequent enough that they're not heavily optimized. So no one really knows what to expect for JVM language performance on a platform without (bimorphic) inline caching. I'd like to keep dispatch_func
s on the radar because people thought they might turn out to be very important, but indeed they do not need to be the focus.
Given that, it sounds like you're suggesting I take the revisions from the discussion above and "officially" post a Phase 0 Proposal?
Given that, it sounds like you're suggesting I take the revisions from the discussion above and "officially" post a Phase 0 Proposal?
This is the phase 0 proposal, so I suggest polling to move to phase 1 :)
Also, @RossTate, looking back on this issue, I still think that the opening post is missing an opening problem statement explaining the problem with call_indirect that this solves. If you could add a brief explanation at the beginning of the opening post similar to the one at https://github.com/WebAssembly/function-references/pull/33/files#diff-877ebb3581ffcc2b4fe25c7b251ac081R351-R360, I think that would make this proposal more accessible.
Also, the opening post is growing rather long, so it might be easier to discuss it if it were a PR we could make inline comments on. @binji, do you think you could create a phase 0 repo for this proposal?
Sorry, I completely missed the mention here. I agree with your later suggestion that we go for phase 1. I think some phase 0 proposals have repos, but I think that's just a historical mistake (along with confusion about phase 0 vs. phase 1).
Thanks for the suggestion, @tlively! I've added a problem summary. It also inspired me to add two new application sections: optimization and subtyping. Hopefully that's the improvement you were looking for.
@binji, I'll add this as a CG-meeting item then. However, right now @tlively and I already have items for the upcoming meeting, and I want to leave space for someone else. If no one comes along, then I'll take the open slot, but otherwise I'll defer it the following meeting.
Thanks, the new problem section is very helpful 👍
_Any function can be called from anywhere in the code with a callindirect. This makes it difficult to optimize functions, and it poses a security risk as one cannot statically ensure access to key internal functions is restricted. Only functions that added to the table can be called with call_indirect.
Instead of saying that there is presently a "security risk," it would be better to describe the security property that dispatch tags allow us to enforce.
IIUC, dispatch tags let us prove that a funcref (whether in the table or otherwise) does not escape a module in such a way that it may be called by some other module. This is not possible to prove in general under any set of other proposals. It is worth having a separate discussion about how useful this property is in practice.
Thanks for catching the bug, and thanks for the suggested fix!
Adding to the still-open agenda. Also, added another application of dispatch_func
that came up in a discussion of how to implement OCaml in WebAssembly.
Last minute thought: would the term "Call Tag" be better? Seems more in line with WebAssembly terminology.
How about "fun cap"? It's the name of a certain kind of capability.
On Wed, Jul 22, 2020 at 8:46 AM Ross Tate notifications@github.com wrote:
Last minute thought: would the term "Call Tag" be better? Seems more in line with WebAssembly terminology.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/WebAssembly/design/issues/1346#issuecomment-662530084, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQAXUFE2Q6337FXHWRPBLDR44CUDANCNFSM4NHCPPBQ .
-- Francis McCabe SWE
@RossTate I think "call tag" is a good name. You're right that so far the spec doesn't have anything called "dispatch."
Okay. @binji, supposing you also like the name change, can you use call-tags
for the new repo when you have a chance to set it up?
Oops, I used dispatch-tags without looking at this. I'll change it to call-tags.
No worries. Thanks!
Alright, I used the text here to seed the overview text in the new repo. Thanks everyone for all your help!!!
Problem
call_indirect
imposes a number of challenges (each discussed in more detail below), both currently and with respect to how WebAssembly might evolve:call_indirect
. One has to do some analysis to check if a reference to that function is made. And once a function reference has been made, it can be very difficult to ensure even something as simple as the function is only accessed by its module. This makes it difficult to optimize functions or to ensure they are accessed in a way that respects the module's intended invariants.call_indirect
needs to compare caller and callee signatures using the "run-time value" of imported/exported types. This can be exploited to convert imported/exported types to and from the type they are supposed to be abstracting. Thuscall_indirect
can be used to access information that would otherwise be concealed behind an abstract type, or to create values of an abstract type that would otherwise be unforgeable.Solution
One way to implement
call_indirect
is to push all the arguments onto the stack and then to push a signature descriptor onto the stack; the callee then pops that descriptor and compares it for (bitwise) equality with the callee's own signature descriptor. If the two match, then the invariants guaranteed by the static type-checker ensure the stack has the arguments expected by the function (and similarly that the caller is expecting the values that will be returned by the function), so the function call proceeds. If they don't match, then it traps.We can solve all the problems above by generalizing this process to what I'll call dispatch tags (though I'm not at all tied to that name). That is, a dispatch tag is a generalization of these signature descriptors.
Defining Dispatch Tags
Somewhere in one of the module's header sections, the module would declare dispatch tags as in
dispatch_tag.new $dt1 : [i32] -> [i32]
. This declaration establishes a new dispatch tag$dt1
that only the module has access to (unless it exports the tag), and which has associated signature[i32] -> [i32]
. The module could also declare dispatch tags as indispatch_tag.canon $dt2 : [i32] -> [i32]
that defines$dt2
to be the canonical dispatch tag for the signature[i32] -> [i32]
. This canon is one that all modules have access to and is uniquely determined by the signature. (Historical note for understanding comments below:dispatch_tag.canon
was formerly nameddispatch_tag.get
.)Associating Dispatch Tags
When a module defines a function it can specify the dispatch tag(s) that its associated
funcref
should accept. (By default, specifically the result ofdispatch.canon
on the signature is associated, making this backwards compatible.) So a module can explicitly specify no tags if it doesn't want the function to be implicitly indirectly callable, or somenew
tag if it only wants it to be indirectly callable with the module's own tag, or multiple tags if it wants to mix-and-match dispatching conventions. Of course, these associated dispatch tags have to be compatible with the function's signature.Calling with Dispatch Tags
call_indirect
would then have a variant, saycall_funcref $dt
that specifies what dispatch tag$dt
to use. If the dispatch tag$dt
has associated signature type[t1*] -> [t2*]
, thencall_funcref $dt
has type[t1* funcref] -> [t2*]
. Thus the provided inputs and expected outputs are type-checked to match the signature associated with the dispatch tag. The currentcall_indirect
becomes just a shorthand for first getting thefuncref
from the appropriate table and then usingcall_funcref
with the dispatch tag resulting fromdispatch_tag.canon
on the expected signature.The execution of
call_funcref $dt
on a givenfuncref
for some function succeeds when the dispatch tag$dt
is one of the function's specified dispatch tags. So if the function specified just one dispatch tag, then a simply bitwise-equality check is done on the two tags at run time. If a match is found, then this implies that the arguments are valid inputs to the function and that the returned values are acceptable outputs from the function. However, by making dispatch tags explicit, it also becomes clear that compatibility of inputs and outputs does not imply the call will succeed because the dispatch tags might (intentionally) not match even if they have the same signature.Applications
Security
To see one way this can be useful for WebAssembly, suppose I am a security-conscious application. I want to be very conscious about all the entry points for my program.
call_indirect
is an opportunity for an unintended entry point. I can try to mitigate that by being careful about the function references that escape my program, but leaks happen, and it would be better to just not have these unintended entry points to begin with.One option is to make it possible to not have any dispatch tags associated with any of my functions. However, that means even I can't
call_indirect
my own functions, which would be a huge problem for many C/C++ programs. So this proposal provides the option to usedispatch_tag.new
for every signature I use and associate each of my functions with the respective new tag. Then, so long as I don't explicitly export those dispatch tags, I know I'm the only one who cancall_indirect
into my functions. Furthermore, if I have subcomponents of my program that are supposed to be separable, I can even make new tags at a finer granularity for each subcomponent to ensure that acall_indirect
in one subcomponent cannot call into another subcomponent. Thus, the proposal makes it easier for me to restrict the scope ofcall_indirect
, both from other modules and even within my own module.Optimization
If a function has no associated dispatch tags, then it is only directly callable, enabling many optimizations. Even if a function does have an associated dispatch tag, if that tag is neither imported nor exported than the optimizer can guarantee that the function can only be successfully called from within the current module instance, and furthermore it can only can be indirectly called from occurrences of
call_funcref
using one of the associated dispatch tags. Similarly, acall_funcref
using a dispatch tag that is neither imported nor exported eliminates the possibility that a successful call might require an module-instance switch.Subtyping
When a function specifies associated dispatch tags, the types of those dispatch tags need only be compatible with the functions signature. Compatibility here can easily incorporate subtyping. That is, so long as a dispatch tag's input types are subtypes of the input types of the function, and the output types of the function are subtypes of the output types of the dispatch tag, then an indirect call using that dispatch tag is completely sound. This then supports the surface-level feature of various typed OO languages where a subclass/subinterface can refine the signature of a method to either accept a broader range of inputs or (more often) produce a more precise output.
Abstraction
Many of the above languages would want modules to be able to export a class
C
without exporting its implementation (or at least its private fields). Dispatch tags are designed to support this pattern and yet avoid the problems outlined in #1343 whereincall_indirect
can be used to expose and get direct access to a module's concrete implementation of an exported abstract type.The Type Imports is still young, so for the sake of conversation suppose that exports are done by (1) specifying a module signature and then (2) specifying how to instantiate the various components of the signature with the module's various definitions. So part 1 might say there is a type
C_type
with no specifics about the type, and then part 2 might say thatC_type
representsref (struct i32 i32)
.In this setting, suppose some surface-level interface method
foo
defined in the module conceptually takes aC
and returns an integer. Using the implementation of interface-method dispatch above, the module would define a dispatch tag$dt_foo_internal
with signature[object_impl (ref (struct i32 i32))] -> [i32]
(where theobject_impl
is thethis
pointer). The module's own implementations of this method are allowed to know the specific implementation of classC
, which is why this dispatch tag uses the type(ref (struct i32 i32))
in its signature.However, although other module's are allowed to provide their own implementations of the surface-level interface method
foo
, they are not allowed to know the implementation ofC
. So in the module's exports, part 1 would specify a dispatch tag$dt_foo : [object_impl C_type] -> [i32]
, and then part 2 would instantiate that tag with$dt_foo_internal
. Internally, this instantiation is valid becauseC_type
was instantiated withref (struct i32 i32)
, but externally other modules know nothing aboutC_type
. Nonetheless, they can use$dt_foo
to invokefoo
on objects and to provide their own implementations offoo
just like they can any other interface method.This pattern enables (controlled) indirect calls across modules, but in a way that respects abstraction. In particular, unlike in #1343, this design can prevent using indirect calls to get direct access to a module's implementation of an abstract type or to forge values of an abstract type. There is just one restriction that needs to be made:
dispatch_tag.canon
must only be allowed for signatures comprised solely of uninstantiable types, such asi32
,i64,
f32,f64
, and—per the discussion in #1343—externref
. That is, it is the ability to generate canonical dispatch tags from abstract types that violates abstraction. (This observation seems to extend more generally to processes that would generate canonical values from types where the values are equal only if the types are equal.)Extension
In addition to direct functions, we could let a module declare a number of dispatch (or indirect?) functions that look like the following:
An indirect call to
$df
checks if the dispatch tag provided at run time is (bitwise) equal with any of$dt1
, .... If it matches$dtn
, then a direct call to$fn
is made (in theory; in practice, this call might be inlined). If no match is found, then the indirect call traps. Type-checking simply involves checking that the signature of each$dtn
is compatible with that of$f1
.Given a
dispatch_func
, one makes afuncref
viaref.func $df
. That might seem odd because previouslyref.func
took afunc
identifier. The disconnect is becausefunc
is currently doing two things: defining a direct function and defining adispatch_func
that calls that function on each of the associated dispatch tags. So everyfunc
identifier is also adispatch_func
identifier, making this change in perspective still backwards compatible.Applications
Interfaces
At present,
call_indirect
is primarily used for untyped (from wasm's perspective) function calls. But we can take the concept of dispatch tags further to support more language features. In particular, this same pattern shows up in interface-method dispatch for languages with multiple inheritance of interfaces. Interface-method dispatch is a critical feature of various popular languages, and good performance for this feature is often achieved through JITing techniques that WebAssembly is aiming to not rely on.The
dispatch_func
extension means that the behavior of afuncref
can depend on the specific dispatch tag used (beyond just matching versus trapping). Using this, we can support what I believe is still the best (by far?) performing non-JITing implementation of interface-method dispatch (and other lesser-known forms of dynamic dispatch) for many languages with multiple inheritance of interfaces. For context, one way to implement interface-method dispatch is to have every v-table have an array of, say, 19 slots, and to assign to every interface method some slot number. Unfortunately, it is possible that an object implements multiple interface methods with the same slot number. So when a interface-method call is made, in addition to the arguments the caller pushes onto the stack the identifier of the interface method (i.e. its dispatch tag) and then calls the function in the matching slot. That function then switches on that identifier and redirect to the appropriate implementation, just like adispatch_func
.It would be very difficult to implement this pattern without direct support from WebAssembly because two interface methods assigned to the same slot can have completely different signatures, i.e. number and size of arguments. So dispatch tags enables an important pattern used in practice to support a feature that is critical for many major languages (specifically Java, Kotlin, and Scala come to mind, though not C# due to its decision to support multiple-instantiation inheritance of generic interfaces).
Dynamic Arity
In functional languages, a value of type
a -> b -> c
(where each letter is a type variable) is a closure of unknown arity. Due to currying, it could be a closure expecting ana
that then will return a closure expecting ab
that then will return ac
. Due to first-class functions, thec
itself could represent a function type, so it could even be the case that this is a ternary closure that has been curried.A functional language could implement closure application by having each closure specify its arity and by having each caller case on this arity. Or a functional language could use
dispatch_func
for a closure'sfuncref
and have each caller use a dispatch tag for the arity at hand which then thedispatch_func
cases on to provide the appropriate functionality. The latter is moderately more efficient, but given the frequency of function applications in functional languages, that moderate improvement would likely be notable.Forwards-Compatibility
I won't go into detail here, but I've done some work to look ahead and this seems to be compatible with features like parameterized (i.e. generic) interfaces and polymorphic functions/methods as well as existential types to eliminate superfluous casts.