WebAssembly / memory64

Memory with 64-bit indexes
Other
195 stars 21 forks source link

Text format for inline data in memories #5

Closed alexcrichton closed 2 years ago

alexcrichton commented 3 years ago

Currently the text format allows for memories to be defined with data inline:

(memory (data "..."))

where the limits are automatically calculated and this "elaborates" to a memory declaration and a data segment. Currently the text format in the Overview.md I think doesn't have an update to account for this, and I presume it'll likely change to:

(memory i64 (data "..."))

for 64-bit memories?

A very minor issue, but wanted to make sure it was logged!

aardappel commented 3 years ago

Yes, I thought that was the most logical change too. Except we already have:

(memory (shared i64 1 1))`

for example, where the limits, and thus the i64 goes inside the list. What I've implemented for Binaryen follows the same scheme, i.e. (memory (data i64 "...")).

Then there is export / import which might come before the limits.

We should not do it differently for shared and data since that would be both inconsistent and hard to parse.

Alternatively we can always put the i64 as the first thing after memory no matter what, i.e. even before an export or a name, but then it is disconnected from the limits which doesn't seem great. You could put it after the export / name, but before data, which is again a bit trickier to parse.

binji commented 3 years ago

Hm, where is (memory (shared i64 1 1)) syntax used? I see (memory 1 1 shared) typically. The inline export always comes first in a definition, so we shouldn't change it for this case.

aardappel commented 3 years ago

Encountered this in Binaryen, see its current parsing code: https://github.com/WebAssembly/binaryen/blob/2aa0cf300998c62aea8cc6698f8325653a9f0895/src/wasm/wasm-s-parser.cpp#L2010-L2093

Example use: https://github.com/WebAssembly/binaryen/blob/master/test/atomics.wast @kripken

binji commented 3 years ago

Hm, yeah that doesn't match the expected format. See the reference interpreter, for example:

memory_type :
  | limits { MemoryType ($1, Unshared) }
  | limits SHARED { MemoryType ($1, Shared) }
aardappel commented 3 years ago

Ok, maybe I should rework Binaryen to remove this special form as well, since it is hard to parse supporting multiple of these forms.

So the final order we want is: name (and import export) -> index type -> limits or data -> "shared" optionally ?

binji commented 3 years ago

yes, that sounds right, though some of the combinations perhaps not. For example, if we allow it, shared should probably come before the data in the sugared form, though that's not currently implemented anywhere, AFAIK.

aardappel commented 3 years ago

Given that there's ~100 occurrences of (shared in Binaryen tests, and that it appears to be a Binaryen specific bug, I'll leave that to be solved in a future commit (rather than reworking it in my in-progress one).