-
Hi, if I have a linear layer the weight only has the value of {0, 1, -1}. Is it possible to utilize your kernel for weight compression and inference speed-up? My current weight is in bfloat16 format.
…
-
## Feature
Hi,
I am working at PaddlePaddle(chinese DL framework). We and other DL frameworks would extremely benefit from integrated bfloat16 numpy datatype. I have seen that TF added its own imple…
-
### Search before asking
- [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…
-
### 🐛 Describe the bug
Category | Model | Accuracy
-- | -- | --
timm_models_amp_bf16_training | botnet26t_256 | fail_accuracy
timm_models_amp_fp16_training | botnet26t_…
-
![image](https://github.com/user-attachments/assets/8a63a25f-74ef-4596-a1a4-c6fc2dc48e10)
我的bm-smi版本 sophon-driver sophon-libsophon sophon-libsophon-dev 这些都是0.5.1版本的。
另外我打印了我的Bmodel,应该是没什么问题的。…
-
**train with bfloat16**
Is there a plan to support bfloat16 training?@maxhgerlach
-
by default, torchtitan use FSDP2 mixed precision (param_dtype=bfloat16, reduce_dtype=float32)
for low-precision dtypes (float8 and int8), it's nature to compare loss curve with bfloat16 and see how…
-
Thank you for making this very useful and well-tested library! Are you planning to add support for bfloat16 format, which is used in ML field? It has different bit widths for mantissa and exponent, bu…
-
both [fused-attention](https://triton-lang.org/main/getting-started/tutorials/06-fused-attention.html#sphx-glr-getting-started-tutorials-06-fused-attention-py) and [flash-attn-og](https://github.com/D…
-
### Background and motivation
The bfloat16 type provides the same number range as the 32-bit IEEE 754 single-precision floating point type, but with a reduced precision (24 bits -> 8 bits). This is…