gfx-rs / wgpu

A cross-platform, safe, pure-Rust graphics API.
https://wgpu.rs
Apache License 2.0
12.32k stars 906 forks source link

[meta] f16 support #4384

Open jinleili opened 2 years ago

jinleili commented 2 years ago

From WGSL spec: https://gpuweb.github.io/gpuweb/wgsl/#floating-point-types

TODO

teoxoy commented 2 years ago

Thanks for filing! There is some more info here: https://github.com/gpuweb/gpuweb/issues/2512

I suspect this might not be too hard to implement in the Metal and Vulkan back-ends; however for D3D it will require us to generate DXIL (i.e. requires a new back-end; see https://github.com/gfx-rs/wgpu/issues/4302).

teoxoy commented 2 years ago

Actually according to this table SM6.2 and -enable-16bit-types should work (no need for DXIL generation).

kdashg commented 1 year ago

This one is interesting to prioritize, because it's in the v1 specs, but we don't need to support it in order to be compliant and usable for our v1 release.

luizberti commented 1 year ago

Would just like to add some weight to the prioritization of this: I am currently trying to do some machine learning and vector search stuff with WGSL, and this is currently a huge blocker to almost everything I'm trying to do. This is basically the only significant thing keeping the wgpu ecosystem from being used in the ML world, and I believe this ecosystem could be one of the most portable ways to deploy ML systems due to the nature of WebGPU.

Another big reason for having this is that even though Rust has the half crate, there is currently no way (outside of core::arch madness), to work with f16 in SIMD fashion _(portable_simd is hopeless, at least until f16 makes it to the standard library)_. Being able to "SIMD on the GPU" would immensely alleviate this constraint.

Sorry if this sounds a bit pushy, that's not how I want to come across, just want to emphasize that this really is an important feature, and that this is currently a total blocker for several important applications, especially in ML.

TimothyOlsson commented 1 year ago

I concur.

While the current "unpack2x16float" function is currently usable, I have encountered some difficulties in using it efficiently in some use cases. This forces me to use less optimal approaches, resulting in slower speed and higher memory usage.

I understand that you may have multiple priorities to balance, but if possible, I would greatly appreciate your consideration in prioritizing the f16

FL33TW00D commented 1 year ago

Notes from digging into this today:

Would love to get more involved in shipping this feature.

teoxoy commented 1 year ago
  • We are currently casting all AbstractFloat to F32 during lexing here. This needs to be delayed until later (during parsing or lowering?).

After we are done implementing support for const-expressions, we should be able to propagate abstract types when we do the evaluation of const-epressions.

cryscan commented 10 months ago

Hi! I am currently developing web-rwkv which implements an LLM with WGPU. It's already fast, but having this feature could make it even faster, which is super nice.

tgross35 commented 5 months ago

Just to crosslink, Rust now has a nightly-only binary16 f16 https://github.com/rust-lang/rust/issues/116909. Extremely unstable at this point but it might make things easier to implement down the line.

FL33TW00D commented 4 months ago

5701