-
I tried to run bfloat16 linear of bitblas
but I got different result
output:
qunatizing /decoder/block/0/layer/0/SelfAttention/k
torch linear took by avg 7.802581787109375e-05
BitBLAS Op…
-
In short, we observed `mixed_bfloat16` in TPU is slower than `float32` in our model benchmarks. Please refer to this [sheet](https://docs.google.com/spreadsheets/d/1TPwbe8p6eD61arkoIXQnPHf3rgFIDFUZCot…
-
Good day
After load saved lora model, i save it to merged. And after load it from merged, i have generation like '+++++ 1000000000000000000000000000000000000000000000000…
-
TypeError: set_default_dtype only supports [float16, float32, float64, bfloat16] , but received paddle.float32
-
### 🐛 Describe the bug
Hi AMD Team,
On MI300X pytorch nightly grouped query attention is running into numeric errors. I have confirmed on H100 that this script does not have numeric errors.
C…
-
BFLOAT16 is a new floating-point format. It's a 16-bit floating point format with an 8 bit exponent and 7 bit mantissa (vs 5 bit exponent, 11 bit mantissa of a half-precision float which is currently …
-
Thank you for releasing code for these inspiring works!
I tried to use bfloat16 for model parameters, and manually converted images and labels from float32 to bfloat16 before feeding them for train…
-
Consider IL support for the `bfloat16` datatype, which is useful for machine learning applications, in an future .NET version.
-
I was looking at the [FLUX.1-dev FP8 example code](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux#single-file-loading-for-the-fluxtransformer2dmodel) in the documentation and noticed…
-
### System Info
a100
### Who can help?
@byshiue
@juney-nvidia
### Information
- [ ] The official example scripts
- [x] My own modified scripts
### Tasks
- [x] An officially supported task in th…