WebAssembly / spec

WebAssembly specification, reference interpreter, and test suite.
https://webassembly.github.io/spec/
Other
3.14k stars 449 forks source link

[spec] Support address value type? #941

Closed achun closed 5 years ago

achun commented 5 years ago

Under the current specifications, the following code applies to wasm32 mode:

(i32.store (i32.const 8) (i32.const 0))
(i64.store (i32.const 8) (i64.const 0))
(f32.store (i32.const 8) (f32.const 0))
(f64.store (i32.const 8) (f64.const 0))
(i32.load (i32.const 8))
(i64.load (i32.const 8))
(f32.load (i32.const 8))
(f64.load (i32.const 8))

Under wasm64 mode needs to be changed to:

(i32.store (i64.const 8) (i32.const 0))
;; ...
(f64.load (i64.const 8))

Address value type compatibility will be better under wasm32/wasm64 mode.

(i32.store (address 8) (i32.const 0))
;; ...
(f64.load (address 8))

e.g

(func  (param $p0 address) (result i32)
    (i32.load (get_local $p0))
)

Of course, this requires more detailed rules.

rossberg commented 5 years ago

64-bit memory access is not just a question of switching the index type. It will also require different instructions (or attributes on instructions) for loads and stores, and different instructions for computing addresses via arithmetics. So simply introducing a type alias would not work.

Ultimately, the producer will have to know whether it is generating code for 32 or 64 bit address space, so it's also not clear what the utility would be of such a feature.

achun commented 5 years ago

Sure.

As stated in the FAQ

Portability: The ISA must be the same for every machine architecture. Stability: The ISA and binary encoding must not change over time (or change only in ways that can be kept backward-compatible).

Although not urgent, it is also important to consider the consistency of the instructions under wasm32 and wasm64.

rossberg commented 5 years ago

Can you elaborate what you mean by that? The idea is that 64 bit address space will be supported by new instructions or extensions of existing instructions, which is trivially backwards-compatible.

xtuc commented 5 years ago

@achun the FAQ states that the instructions themself are machine architecture independent. The machine architecture and the memory space are two separate things.

achun commented 5 years ago

@xtuc Yes, your explanation is consistent with the original text. But now a wasm file cannot be compatible with both wasm32 and wasm64 modes. WASM IR is fresh, with no historical burden. It is easier to consider consistency now than to consider it later.

@rossberg

After rethinking, I think adding alias types is more lightweight. Such as int or uint, equivalent to i32 under wasm32, equivalent to i64 under wasm64. The virtual machine reinterprets its type when loading a .wasm file. This is the most basic.

Of course, this will bring more new problems:

A huge list of troubles..... Some problems can find theoretical solutions, while others have to weigh the pros and cons.

Let's discuss the problem of field offsets first.

This requires logging the layout of the struct field in IR, assuming it is named layout section. The virtual machine calculates the offset of each field when loading the .wasm file. For example:

(module
  (layout $struct_a 
     ;; This is just a demonstration, TODO details.
     ;; Sizeof each field, field name is not important.
     (1 2) ;; The 1 means 1-byte  immediate, The 2 means the field size is 2-bytes.
     (1 4)
     (3)    ;; The 3 means i32/i64 for wasm32/wasm64 mode.
     (4 $struct_b)  ;; struct_b
     (5 $struct_b)  ;; *struct_b
     ;; Three bytes is enough to record the length of the field.
     ;; Maximum representable: (2 65535)
     ;; Perhaps using a bitfield: two-bytes
     ;; xxxx xxxx xxxx yyyy
     ;;  field-size         flag
  )

  (layout $struct_b (; ... omit ...;))

  (func $use_case
    (param $address int) (result int)
    ;; Replaced after loading. e.g under wasm32
    ;; (param $address i32) (result i32)

    (local.get $address)
    (int.load (offset $struct_a 2)) ;; The 2 is index of '(3)'
    ;; Replaced after loading. e.g under wasm32
    ;; (i32.load offset=8)
  )
)
rossberg commented 5 years ago

But now a wasm file cannot be compatible with both wasm32 and wasm64 modes.

@achun, correct, but it was a conscious decision to not make that a goal, and that choice is already engrained in the Wasm design. Personally, I think that was the right decision, because (a) I doubt that it would be a useful feature in practice, and (b) it introduces a lot of complications, as you observe, and (c) we want to be able to express Wasm modules that can bridge between ones using 32 bit and ones using 64 bit address space -- a single module-global address space "mode" would badly get in the way of that.

achun commented 5 years ago

and that choice is already engrained in the Wasm design.

@rossberg , in this case, it seems like a bad idea that will lead to some_64.wasm / some.wasm64 filenames in the future.

rossberg commented 5 years ago

@achun, there may be a misunderstanding about what Wasm 64 bit means and who is making that choice. To be clear, that is a choice that is completely independent from the target platform -- both 32 and 64 bit memories will work on all platforms. The assumption of the Wasm design is that only the producer can make an informed choice, since they know the requirements of a specific module or app. Your suggestion would only make sense if the consumer were to make that choice, but what would they base the decision on, and is there even a suitable authority on that end to make the choice?

achun commented 5 years ago

@rossberg, My reason is simple. The compiler has been doing the same thing for a long time: Generating two formats of files for the 32-bits / 64-bits addressing mode. Whether it is machine code or opcode. Of course this has historical reasons.

I think this is inefficient and not environmentally friendly.

The new WASM IR has the opportunity to improve.

I didn't find that using alias types can lead to obstacles that cannot be crossed, so why not?

WASM: Compile once, run anywhere