apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.67k stars 3.45k forks source link

[WebGPU] Handle device OOM in createBuffer #17005

Closed CharlieFRuan closed 4 months ago

CharlieFRuan commented 4 months ago

Prior to this PR, WebGPU errors such as OOM are only logged as a warning without affecting the program. This PR handles WebGPU error using pushErrorScope() and popErrorScope() following https://github.com/gpuweb/gpuweb/blob/main/design/ErrorHandling.md.

We replace createBuffer() with tryCreateBuffer(), in which we catch all three types of errors. For now, we treat any error occurred in createBuffer() fatal and hence do device.destroy(). When a device is initiated, we use device.lost.then() to listen to the event of device.destroy(), upon which we log the error and call Instance.dispose(), prompting the user to re-initialize.

See https://github.com/mlc-ai/web-llm/issues/356 for motivation.

Tested end-to-end with WebLLM.