WebAssembly / wasi-nn

Neural Network proposal for WASI
448 stars 35 forks source link

Specifying execution targets #2

Open abrown opened 4 years ago

abrown commented 4 years ago

The initial proposal includes an execution_target flag to load the model for a specific type of device, e.g. cpu, gpu, tpu, etc. In the WASI CG meeting, several attendees discussed changing this:

abrown commented 4 years ago

cc: @sunfishcode, @mingqiusun, @leecam, @rasquill, @jlb6740

jlb6740 commented 4 years ago

@andrew .. In general I suggested providing the parameter of the target (CPU, GPU, TPU) as a hint to the implementation but that hint parameter would not being specific or imply why the user was suggesting the hint ... and ultimately the implementation would decide what to do with that hint. I was thinking the context of the discussion on removing the parameter seemed to revolve around performance but I was wondering if there might be other considerations (security, or power as you point out) why someone may suggest a specific target and so would want to hint at a target. When you talk about WebNN's power option could this not be in addition to the target parameter instead of in place of? Could there be use for a target hint and separately a power consumption hint (option 3 and 4)?

Specifically for that question, in the context of webnn I am not sure if their powerperformance parameter is meant to imply recommendation of how to throttle the target (CPU, GPU, etc) to control power consumption or if it is simply meant to suggest what the target would be?

shschaefer commented 11 months ago

@jlb6740, the power preference in WebNN and many frameworks is designed to enable the caller to choose from more than one potential device. With GPUs, there are often more than one. A laptop may have an integrated GPU which is part of the SoC and a discrete GPU for performance - games, video, etc...

The pattern specified there is also a good one for device selection. If the user asks for a TPU and for whatever reason it cannot be used, always fall back to the CPU. This deterministic behavior makes it easy to understand what the platform will do. If we want a "let the platform decide", the caller should knowingly opt in to that as they will best understand the tradeoffs they are willing to take.

geekbeast commented 11 months ago

I agree with @shschaefer and would go one step further and say that being able to list/select specific devices in a local context would be super useful to match behavior of existing frameworks.

From an edge provider side this flag may impact handling of inference request, including how usage is billed (i.e CPU vs GPU). So I would definitely be supportive of keeping in its current state.

squillace commented 11 months ago

The "Feature" of this choice is one between "get it running, speed isn't an issue" and "I happen to know precisely where this can run and only there because I designed it that way." In other words, can wasi-nn do both utility inferencing and also highly optimized inferencing?

The latter involves a) big models and b) specific pieces of hardware. "A" we are tackling in #36, IIUC. Here, it is very helpful to have a hint, but deterministically fall back to CPU - because you can't do compute without that (pretty much, though I'm sure there are some "artisanal scenarios" that compute without a general processor).

It's the failure case that's important, I think. In an artisanal case, where the module is purpose-built for hardware, having wasm is "nice" but likely only a small portability benefit. Is there any way to specify some sort of ordered list? From tpu to gpu to "default:cpu"? Power could also be a hint with a "default:handwave" fall through as well.

In centralized throughput services like CDN or public cloud, you use hardware to optimize pure throughput (and the feature might only be possible there); edge and tiny compute devices -- gateways, telemetry collectors and processors, etc. -- are unlikely to have artisanal processors for some time -- they'll have cpus and gpus and due to the sheer number of them they will not be able to upgrade easily or quickly. If a module "hints" at a TPU but the device only has a GPU, it would be nice to be able to fall to the GPU before defaulting out to the CPU.

Does this thinking apply to this issue here?