amax Search Results - Githubissues

tenstorrent/tt-metal #12936

Unsupported cases for `aten.amin/amax` conversion

When trying to convert `aten.amin/amax` to `ttnn.min/max`, we ran into the issues below with tt-metal and currently don't support them: - [ ] `keepdim = false` isn't supported in tt-metal - [ ] Re…

jerrysky3 updated 3 weeks ago

mlcommons/training_results_v4.0 #6

cuDNN Error: Tensor 'sdpa_fp8::Amax_O' strides not set

have a try to reproduce Nvidia's results on using slurm + enroot + pyxis 1. downgrade the transformers and huggingface_hub libs (huggingface_hub==0.23.2 transformers==4.40.2) because the versions…

zhenghuanbo updated 1 month ago

pytorch/pytorch #136267

PT2 should leverage partial reductions to speed up larger re…

### 🚀 The feature, motivation and pitch In float8 recipes, we need to "scale" our tensors, which consists in computing the abs-max along certain dimensions. One given tensor is usually scaled mu…

lw updated 1 month ago

Anttwo/SuGaR #124

Collecting package metadata (current_repodata.json): failed

Hello, your work is great! I encountered some issues during the installation process of the environment. I am running commands `conda install -c fvcore -c iopath -c conda-forge fvcore iopath` Wh…

tapowanliwuyun updated 1 month ago

pytorch/ao #1086

Torchao does not work with HSDP

``` [rank0]: File "/opt/venv/lib/python3.10/site-packages/torch/distributed/_composable/fsdp/_fsdp_param.py", line 653, in all_gather_inputs [rank0]: ) = sharded_local_tensor.fsdp_pre_all_gath…

goldhuang updated 1 week ago

pytorch/pytorch #134706

Better namings for triton fusion ops when a custom triton ke…

### 🚀 The feature, motivation and pitch Hi, the code can run fine. It is just that the generated comments and names are a bit confusing. Say we have a function with some torch ops at the beginning…

henrylhtsang updated 2 months ago

NVIDIA/TransformerEngine #1275

BUG: fused_amax_and_scale_update_after_reduction(): incompat…

Currently getting the following error on a simple forward with a transformer model when using DelayedScaling: ``` 110882 [rank0]: with te.fp8_autocast(enabled=True, fp8_recipe=self.te_fp8_recipe):…

cassanof updated 1 week ago

NVIDIA/Fuser #2258

fp8 recipe support with full reduction

fp8 recipe requires `amax` called on output tensor for delayed scaling. An example can be seen here: https://github.com/jjsjann123/triton_minitu/blob/60b30e46f015fcdf26b6dcf90f875428b799c0b3/nvfuse…

jjsjann123 updated 3 days ago

pytorch/pytorch #138715

inductor error with PT Lightning + FSDP + torchao.float8 + t…

### 🐛 Describe the bug I see the following error in a toy training loop with PyTorch Lightning, FSDP1, torchao.float8 and torch.compile: ``` [rank0]: File "/home/vasiliy/.conda/envs/pt_nightly_…

vkuzo updated 1 week ago

pytorch/pytorch #129457

[PT2][fp8][FSDP2] compile the function that pre-computes fp8…

### 🚀 The feature, motivation and pitch share repro for @bdhirsh , @tugsbayasgalan on the gaps of torch.compile for FSDP2 fp8 all-gather for FSDP2 fp8 all-gather, it's criticial to pre-compute a…

weifengpy updated 2 days ago

1000+ results for amax

1000+ results
for amax