WebAssembly / wasi-nn

Neural Network proposal for WASI
429 stars 34 forks source link

Specify coercion of non-wasm types #62

Open geekbeast opened 8 months ago

geekbeast commented 8 months ago

The current tensor types are not all supported in core wasm.

    enum tensor-type {
        FP16,
        FP32,
        FP64,
        BF16,
        U8,
        I32,
        I64
    }

WASM supports the following core types:

i32 : 32-bit integer. i64 : 64-bit integer. f32 : 32-bit float. f64 : 64-bit float.

In addition, there is some utility types such as u8, u16, u32, i32, etc but these do not exist for floating point.

The complication is that models can take non-standard wasm types and output non-started wasm types (packed into bytes). Even if tensors are resources, at some point they must be created and at some point they must be interpreted for use, usually in the WASM guest.

One possible behavior is that we take everything as one of the core wasm types and then coerce on the host side to the correct tensor type. Another mutually exclusive alternative is that we explicitly say we are not doing any coercion, so there must be some ability to emulate the type and it must be implicitly castable in-place by the bindings.

The latter would seem like less work for implementers and the former is likely to be more usable by consumers of the API.

We can also of course do nothing and hope that Google adds support for fp16 into core wasm, but there's a lot of other exotic floating point types like fp8 and fp4 that are becoming relevant to generative ai (see https://arxiv.org/abs/2310.16836v1). This may be a dead in the hype cycle, but I suspect the techniques will be forever relevant for larger model sizes.

geekbeast commented 8 months ago

@squillace Thoughts?