-
When trying to convert `aten.amin/amax` to `ttnn.min/max`, we ran into the issues below with tt-metal and currently don't support them:
- [ ] `keepdim = false` isn't supported in tt-metal
- [ ] Re…
-
have a try to reproduce Nvidia's results on using slurm + enroot + pyxis
1. downgrade the transformers and huggingface_hub libs (huggingface_hub==0.23.2 transformers==4.40.2) because the versions…
-
### 🚀 The feature, motivation and pitch
In float8 recipes, we need to "scale" our tensors, which consists in computing the abs-max along certain dimensions.
One given tensor is usually scaled mu…
-
Hello, your work is great! I encountered some issues during the installation process of the environment. I am running commands
`conda install -c fvcore -c iopath -c conda-forge fvcore iopath`
Wh…
-
```
[rank0]: File "/opt/venv/lib/python3.10/site-packages/torch/distributed/_composable/fsdp/_fsdp_param.py", line 653, in all_gather_inputs
[rank0]: ) = sharded_local_tensor.fsdp_pre_all_gath…
-
### 🚀 The feature, motivation and pitch
Hi, the code can run fine. It is just that the generated comments and names are a bit confusing.
Say we have a function with some torch ops at the beginning…
-
Currently getting the following error on a simple forward with a transformer model when using DelayedScaling:
```
110882 [rank0]: with te.fp8_autocast(enabled=True, fp8_recipe=self.te_fp8_recipe):…
-
fp8 recipe requires `amax` called on output tensor for delayed scaling.
An example can be seen here: https://github.com/jjsjann123/triton_minitu/blob/60b30e46f015fcdf26b6dcf90f875428b799c0b3/nvfuse…
-
### 🐛 Describe the bug
I see the following error in a toy training loop with PyTorch Lightning, FSDP1, torchao.float8 and torch.compile:
```
[rank0]: File "/home/vasiliy/.conda/envs/pt_nightly_…
-
### 🚀 The feature, motivation and pitch
share repro for @bdhirsh , @tugsbayasgalan on the gaps of torch.compile for FSDP2 fp8 all-gather
for FSDP2 fp8 all-gather, it's criticial to pre-compute a…