-
With Julia 1.11 coming up, we will have native support for BFloat16 https://github.com/JuliaLang/julia/pull/51470.
Metal also supports BFloat16 onwards from [Apple6 GPU architecture](https://develo…
-
**train with bfloat16**
Is there a plan to support bfloat16 training?@maxhgerlach
-
### 🐛 Describe the bug
Category | Name | Inductor vs. Eager [XPU] | Inductor vs. Eager [CUDA] | XPU vs. CUDA [Eager] | XPU vs. CUDA [Inductor]
-- | -- | -- | -- | -- | --
huggingface_amp_fp16_tra…
-
### 🐛 Describe the bug
Category | Model | Accuracy
-- | -- | --
timm_models_amp_bf16_training | botnet26t_256 | fail_accuracy
timm_models_amp_fp16_training | botnet26t_…
-
i try to use QAT to quantize qwen2 1.5B model
The error raise from function `training.load_from_full_model_state_dict(
model, model_state_dict, self._device, self._is_rank_zero, strict=T…
-
I found v2.6.3's `flash_attn_varlen_func` runs faster than v2.7.0.post2's `flash_Attn_varlen_func` on H100.
code
```
import torch
from hopper.flash_attn_interface import flash_attn_func, flash…
-
### Background and motivation
The bfloat16 type provides the same number range as the 32-bit IEEE 754 single-precision floating point type, but with a reduced precision (24 bits -> 8 bits). This is…
-
Thank you for making this very useful and well-tested library! Are you planning to add support for bfloat16 format, which is used in ML field? It has different bit widths for mantissa and exponent, bu…
-
I'm encountering an issue with the Mochi VAE Decode Spatial Tiling node when running it on an Apple M1 Max.
![image](https://github.com/user-attachments/assets/7efd22cf-33c0-4bc6-b12f-2de9fb4b8f4b)
…
-
### 🐛 Describe the bug
Dear all,
We seemly found a bug in nn.linear forwarding, here is a minimal example:
```python
# import
import torch
import time
# Set input size, output size, an…