Open jinleili opened 2 years ago
Thanks for filing! There is some more info here: https://github.com/gpuweb/gpuweb/issues/2512
I suspect this might not be too hard to implement in the Metal and Vulkan back-ends; however for D3D it will require us to generate DXIL (i.e. requires a new back-end; see https://github.com/gfx-rs/wgpu/issues/4302).
Actually according to this table SM6.2 and -enable-16bit-types
should work (no need for DXIL generation).
This one is interesting to prioritize, because it's in the v1 specs, but we don't need to support it in order to be compliant and usable for our v1 release.
Would just like to add some weight to the prioritization of this: I am currently trying to do some machine learning and vector search stuff with WGSL, and this is currently a huge blocker to almost everything I'm trying to do. This is basically the only significant thing keeping the wgpu
ecosystem from being used in the ML world, and I believe this
ecosystem could be one of the most portable ways to deploy ML systems due to the nature of WebGPU.
Another big reason for having this is that even though Rust has the half
crate, there is currently no way (outside of core::arch
madness), to work with f16
in SIMD fashion _(portable_simd
is hopeless, at least until f16 makes it to the standard library)_. Being able to "SIMD on the GPU" would immensely alleviate this constraint.
Sorry if this sounds a bit pushy, that's not how I want to come across, just want to emphasize that this really is an important feature, and that this is currently a total blocker for several important applications, especially in ML.
I concur.
While the current "unpack2x16float" function is currently usable, I have encountered some difficulties in using it efficiently in some use cases. This forces me to use less optimal approaches, resulting in slower speed and higher memory usage.
I understand that you may have multiple priorities to balance, but if possible, I would greatly appreciate your consideration in prioritizing the f16
Notes from digging into this today:
Would love to get more involved in shipping this feature.
- We are currently casting all AbstractFloat to F32 during lexing here. This needs to be delayed until later (during parsing or lowering?).
After we are done implementing support for const-expressions, we should be able to propagate abstract types when we do the evaluation of const-epressions.
Hi! I am currently developing web-rwkv
which implements an LLM with WGPU. It's already fast, but having this feature could make it even faster, which is super nice.
Just to crosslink, Rust now has a nightly-only binary16 f16
https://github.com/rust-lang/rust/issues/116909. Extremely unstable at this point but it might make things easier to implement down the line.
From WGSL spec: https://gpuweb.github.io/gpuweb/wgsl/#floating-point-types
TODO