We have circulated the ideas a bit and I think it might be worthwhile kicking off some discussions.
Background and Motivation
DLPack supported many frameworks in the python ecosystem with a broad set of platforms. One of the things that are emerging and would be super nice is to think about support for the Web platform.
Specifically, with the emergence of WebGPU, it would be really nice for frameworks that operate on the browser environment to be able to perform the zero-cost exchange. In this case, the primary use could be WebGPU, but it also can include web assembly memory. For example, it would be really ideal that the WebGPU NDArray/Tensor backed by framework A can be used by framework B in a zero-copy fashion.
Borrowing past experience from DLPack, we find that some initial discussion from broad stakeholders can be helpful to form a minimum but a sufficient foundation that frameworks can sure. So we would like to start an initial kickoff discussion to welcome everyone to chime in here before we come down to any concrete actions.
This thread aimed at an initial kick-off discussion to see people's preferences and what kind of minimum common format makes sense for the web env. Based on our past lessons, with sufficient input and prior knowledge, likely something in common would emerge to form a minimum thing that the framework can share and reuse. DLPack header is one such example.
Some Initial Technical Considerations
We list some of the initial design considerations.
Reuse of the data structure
DLPack already comes with WebGPU flag, and the overall layout can be used for array exchange. One potential way that we can start with is to replicate most of the array structures but in javascript, which allows effective exchange among frameworks. Following the success lesson from DLPack, it is important that the exchange tensor carry a deleter, which allows the caller frameworks to define its own memory pool and ways of recycling the framework.
Minimum WebGPU Harness Among Frameworks
Unlike CUDA, where the nvidia driver defines the common global device across the environment and we can simply refer to them as "cuda:0" and "cuda:1". There is no standard default WebGPU device globally. Each application has to use its own adapter to create a WebGPU Device. While this enables flexibility across applications, it prevents the potential sharing among applications. The WebGPU device from app0 can be different from app1.
To enable sharing across the device, we need to have the ability to ensure that "webgpu:0" from framework A is the same as "webgpu:0" from framework B, which means something similar to the common cuda runtime layer.
To resolve this problem, we need a minimum harness across frameworks to create a common WebGPU device. Of course, this would mean that the frameworks will need to depend on this part of the code (e.g. possibly as a webdlpack package), but the intention would be to keep it as a minimum, so frameworks for example, can still have their own memory-pool and runtime mechanism if necessary internally.
Here is one initial strawman just to demonstrate the idea
// common context shared across frameworks
// find a way to have frameworks to obtain a global singleton in env
class WebDLPackContext {
// common setup logic to request webgpu device
async setup(cfg) {
}
// called by frameworks to get the default device by ID.
getWebGPUDevice(device_id: number): WebGPUDevice {
}
};
WebAssembly Compatibility
Some of the frameworks might need to compile through WASM. That would mean a common C ABI compatible layout in memory would be useful. Luckily, DLPack already provides that. The main follow-up question is to make sure frameworks agree on a common WebGPU harness. One of the main thing that is missing is the WASM Buffer pointer translation support.
Specifically, when we have a GPU pointer in wasm, such a pointer needs to be translated to a WebGPU Buffer that is retained by the javascript runtime. If we want to have a common layer that also supports exchange among WASM DLTensors. Then we will need to have a common wasm buffer translation layer that handles buffer allocation, and translation with the following functions.
GPUPointer which is an alias of number
allocWebGPU() ->GPUPointer
gpuPointerToBuffer(ptr:GPUPointer) ->GPUBuffer
freeWebGPU(ptr: GPUPointer)
// common context shared across frameworks
// find a way to have frameworks to obtain a global singleton in env
class WebDLPackContext {
// common setup logic to request webgpu device
private gpuBuffers: Record<number, GPUBuffer> = {};
async setup(cfg) {
}
// called by frameworks to get the default device by ID.
getWebGPUDevice(device_id: number): WebGPUDevice {
}
// translation layer that translates buffer from
getBufferFromGPUPtr(ptr: number): GPUBuffer {
}
};
We have circulated the ideas a bit and I think it might be worthwhile kicking off some discussions.
Background and Motivation
DLPack supported many frameworks in the python ecosystem with a broad set of platforms. One of the things that are emerging and would be super nice is to think about support for the Web platform. Specifically, with the emergence of WebGPU, it would be really nice for frameworks that operate on the browser environment to be able to perform the zero-cost exchange. In this case, the primary use could be WebGPU, but it also can include web assembly memory. For example, it would be really ideal that the WebGPU NDArray/Tensor backed by framework A can be used by framework B in a zero-copy fashion. Borrowing past experience from DLPack, we find that some initial discussion from broad stakeholders can be helpful to form a minimum but a sufficient foundation that frameworks can sure. So we would like to start an initial kickoff discussion to welcome everyone to chime in here before we come down to any concrete actions. This thread aimed at an initial kick-off discussion to see people's preferences and what kind of minimum common format makes sense for the web env. Based on our past lessons, with sufficient input and prior knowledge, likely something in common would emerge to form a minimum thing that the framework can share and reuse. DLPack header is one such example.
Some Initial Technical Considerations
We list some of the initial design considerations.
Reuse of the data structure
DLPack already comes with WebGPU flag, and the overall layout can be used for array exchange. One potential way that we can start with is to replicate most of the array structures but in javascript, which allows effective exchange among frameworks. Following the success lesson from DLPack, it is important that the exchange tensor carry a deleter, which allows the caller frameworks to define its own memory pool and ways of recycling the framework.
Minimum WebGPU Harness Among Frameworks
Unlike CUDA, where the nvidia driver defines the common global device across the environment and we can simply refer to them as "cuda:0" and "cuda:1". There is no standard default WebGPU device globally. Each application has to use its own adapter to create a WebGPU Device. While this enables flexibility across applications, it prevents the potential sharing among applications. The WebGPU device from app0 can be different from app1.
To enable sharing across the device, we need to have the ability to ensure that "webgpu:0" from framework A is the same as "webgpu:0" from framework B, which means something similar to the common cuda runtime layer.
To resolve this problem, we need a minimum harness across frameworks to create a common WebGPU device. Of course, this would mean that the frameworks will need to depend on this part of the code (e.g. possibly as a
webdlpack
package), but the intention would be to keep it as a minimum, so frameworks for example, can still have their own memory-pool and runtime mechanism if necessary internally.Here is one initial strawman just to demonstrate the idea
WebAssembly Compatibility
Some of the frameworks might need to compile through WASM. That would mean a common C ABI compatible layout in memory would be useful. Luckily, DLPack already provides that. The main follow-up question is to make sure frameworks agree on a common WebGPU harness. One of the main thing that is missing is the WASM Buffer pointer translation support.
Specifically, when we have a GPU pointer in wasm, such a pointer needs to be translated to a WebGPU Buffer that is retained by the javascript runtime. If we want to have a common layer that also supports exchange among WASM DLTensors. Then we will need to have a common wasm buffer translation layer that handles buffer allocation, and translation with the following functions.
GPUPointer which is an alias of number
allocWebGPU() ->GPUPointer
gpuPointerToBuffer(ptr:GPUPointer) ->GPUBuffer
freeWebGPU(ptr: GPUPointer)