WebAssembly / conditional-sections

Other
14 stars 7 forks source link

Unstable indices #21

Open sunfishcode opened 4 years ago

sunfishcode commented 4 years ago

The only way to statically identify functions, tables, memories, and globals within a wasm module in is by index, and conditional sections can mean that indices can be dependent on instantiation-time resolution. This complicates some analysis and transformation use cases.

Consider a wasm transformation that takes a module, inserts a function, and then inserts some calls to that function. With wasm today, this is relatively straightforward, and there are multiple tools that can do it. But with conditional sections, doing this correctly in general can get complex, because you can't easily insert calls which reliably name the function you inserted.

Or, consider an analysis tool that wants to do program slice analysis, for example, looking backwards from every WASI call which uses a filesystem path. With conditional sections, it's not possible, in general, to identify which calls in a module are calls to filesystem functions without knowing the conditions that will be selected at instantiation time.

It's worth pointing out that this second problem also occurs with call_indirect -- if you place a function in a table, potentially any call_indirect function in the program can call it. A key difference though is that tables only contain functions that have their address taken. Conditional sections affect every function that occurs after them in the index space.

tlively commented 4 years ago

In the presence of conditional sections, tools would have to verify that conditional sections are arranged in such a way that each function has a statically known index before doing analyses like this. That would require being able to determine whether adjacent conditional sections are organized into groups with the same number of functions in each section such that exactly one of the sections will be activated for any input feature set. Making these assumptions explicit in the structure of the module is the motivation behind the switch-style conditional sections idea, but unfortunately that's not sufficient either.

In short, yes, this is a problem.

rossberg commented 4 years ago

The tool would have to extract the set of flags used in the module and look at all expansions resulting from the possible combinations of flag values. That is the same it would have to do with a switch.

tlively commented 4 years ago

@rossberg do you know an algorithm for this that is better than exponential in the number of flags? It seems to me that this problem is tractable only if tools can expect a very constrained usage of this feature, which is an unfortunate state of affairs if not necessarily a show stopper.

rossberg commented 4 years ago

@tlively, in the general case, you can't avoid exponential. However, I don't think that regular transformation tools would necessarily need that. For example, @sunfishcode's example:

Consider a wasm transformation that takes a module, inserts a function, and then inserts some calls to that function. With wasm today, this is relatively straightforward, and there are multiple tools that can do it. But with conditional sections, doing this correctly in general can get complex, because you can't easily insert calls which reliably name the function you inserted.

Syntactic transformations like that can actually be handled without understanding the conditions. Simply insert any new function at index 0, then all existing indices shift the same, regardless of conditions they might be wrapped in.

Or, consider an analysis tool that wants to do program slice analysis, for example, looking backwards from every WASI call which uses a filesystem path. With conditional sections, it's not possible, in general, to identify which calls in a module are calls to filesystem functions without knowing the conditions that will be selected at instantiation time.

Yes, this is true. But this is not a problem unique to Wasm, any program slicer for C has to deal with it as well. AFAIK, the user typically has to pick a concrete configuration for such tools to operate.

sunfishcode commented 4 years ago

Inserting a function at index 0, or even at the first non-imported index, isn't enough when there are any conditional imports present, since locally defined functions follow imports in the index space.

Yes, this is true. But this is not a problem unique to Wasm, any program slicer for C has to deal with it as well. AFAIK, the user typically has to pick a concrete configuration for such tools to operate.

That's indeed a limitation of C, but with conditional sections it would also be a limitation of wasm in general.

And in addition to requiring exponential time to analyze in the worst case, it would seem that tools that transform modules might have to write exponential-sized output in the worst case, in order to make sure the transformation is correct under all configurations.

tlively commented 4 years ago

And in addition to requiring exponential time to analyze in the worst case, it would seem that tools that transform modules might have to write exponential-sized output in the worst case, in order to make sure the transformation is correct under all configurations.

@sunfishcode can you give an example of this?

sunfishcode commented 4 years ago

@tlively Consider a module containing N functions, all with the same signature, and all conditional, each with its own condition. The function bodies contain miscellaneous computations, including direct calls to other functions in the module. The conditional nature of the functions mean that most calls could be made to call many different function in the module, depending on the conditions.

Now suppose I want to write a transformation that inserts instrumentation to trace cycles in the static call graph. To identify the cycles I have to analyze 2n condition sets. But I can't simply do 2n in-place transformations, because that might insert instrumentation which is on a cycle in one condition set but not in another. Assuming I don't want to simply fail, in the worst case, I don't see any alternative to duplicating the module contents 2n times, once for each condition set, to ensure that the module under each condition set is instrumented correctly.

sunfishcode commented 4 years ago

To be sure though, this is a pathological example.

tlively commented 4 years ago

Bwahahaha I love it! Yeah this is a big problem. What if we adopted the "switch" idea that bundles a sequence of conditional sections together, but forced all the sections to have the same "size" by specifying an unconditional size in the switch section header and making it a decoding error if the activated conditional section does not have that "size"? "Size" would be the number of vector elements in the section for most sections. That way it would be structurally impossible for an element's index to be different depending on the configuration.

sunfishcode commented 4 years ago

@tlively I don't have a comment on switch-like constructs in general; my only observation here is that from the perspective of wasm analysis and wasm-to-wasm transformation, being able to take an index and know that what it refers to doesn't depend on unrelated conditional things elsewhere in the index space shifting everything around makes a lot of tasks much less scary.