NVIDIA apex issues - Githubissues

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD 3-Clause "New" or "Revised" License

8.16k stars 1.35k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Increase tolerance to workaround unit test failures on A100

#1766 nWEIdia closed 5 months ago
0
64-bit indexing Adam

#1765 eqy closed 6 months ago
0
apex installation failures

#1764 momo1986 opened 6 months ago
1
Installation instructions don't build/install the C modules

#1763 zxti opened 6 months ago
2
Apex installation fails

#1762 yang606 opened 6 months ago
1
Cannot install apex on the machine of CUDA 12.2

#1761 momo1986 opened 6 months ago
6
Make fused normalization functions backward-compatible

#1760 timmoon10 closed 6 months ago
2
[contrib] Improve FusedAdamSWA interface and add unit tests

#1759 lirundong closed 6 months ago
1
add async copy for openfold swa triton kernel

#1758 azazhu closed 6 months ago
0
No module named 'amp_C' error for py3.9 pytorch2.1.0 cuda12.1

#1757 rocke2020 closed 7 months ago
1
Fused RoPE for `thd` format

#1756 yaox12 closed 5 months ago
1
ModuleNotFoundError: No module named 'fused_layer_norm_cuda', ubuntu 22.04, Successfully installed apex-0.1

#1755 dhamaraiselvi opened 7 months ago
3
Use recommended PyTorch methods to silence warnings

#1754 deepakn94 closed 7 months ago
0
why a kernel like CUDAFunctor_add appears when testing MixedFusedRMSNorm?

#1753 HangJie720 opened 7 months ago
0
[FusedRoPE] Fuse type conversion and cos/sin

#1752 yaox12 closed 7 months ago
1
Avoid `.contiguous()` in fused RoPE

#1751 yaox12 closed 7 months ago
0
[Bug] Fix a bug in fused rope

#1750 yaox12 closed 7 months ago
0
Distributed optimizer support for contiguous param buffer with FP8 params

#1749 timmoon10 closed 7 months ago
1
Whether to support Cuda 12.1

#1748 yangzhipeng1108 opened 7 months ago
4
Misc Changes

#1747 nWEIdia closed 7 months ago
1
A fused `apply_rotary_pos_emb` implementation for Megatron-Core

#1746 yaox12 closed 7 months ago
0
More Precision Combinations For GroupNorm

#1745 alpha0422 closed 8 months ago
0
GPU memory leak with Flair and APEX

#1744 astropic opened 8 months ago
0
Fix `rtol` in `assert_close` cleanup

#1743 eqy closed 8 months ago
0
Cleanup usage of `self.assertTrue(torch.allclose(...`

#1742 eqy closed 8 months ago
0
ninja: error: '/app/csrc/amp_C_frontend.cpp', needed by '/app/build/temp.linux-x86_64-cpython-310/csrc/amp_C_frontend.o', missing and no known rule to make it

#1741 Tolga-Karahan closed 8 months ago
0
Loop through all available engines for cuDNN heuristics search

#1740 minitu closed 8 months ago
1
add test for openfold triton mha kernel

#1739 azazhu closed 8 months ago
0
error: command '/usr/local/cuda-11.3/bin/nvcc' failed with exit code 1

#1738 Brion112233 opened 9 months ago
1
When doing pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./, shows ModuleNotFoundError: No module named 'packaging'

#1737 lainmn opened 9 months ago
8
fused_layer_norm_cuda.rms_forward_affine gives runtime error when run on cuda:1

#1736 Kushdesh opened 9 months ago
0
Installation fails (due to recent change?)

#1735 hector-gr opened 9 months ago
12
Add openfold triton code

#1734 ar-nowaczynski closed 9 months ago
0
Add hysteresis support for AMP gradient scale update

#1733 minitu closed 9 months ago
1
Rui/dev fast ln

#1732 RuiWang1998 closed 9 months ago
0
Use master weights for bfloat16 FusedAdam when master_weights=True

#1731 cbcase opened 9 months ago
2
is it possible to update `conda-forge/nvidia-apex` to a recent tag?

#1730 stas00 closed 9 months ago
2
torch1.13.1 cuda11.6 python3.8 TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

#1729 yuhuai4554 opened 9 months ago
2
FusedAdam doesn't allocate master weights for bfloat16

#1728 cbcase opened 9 months ago
2
Add multi_tensor_unscale_l2norm_cuda

#1727 minitu closed 9 months ago
1
Rui/dev fast ln

#1726 RuiWang1998 closed 9 months ago
0
Option to only build `amp_C` module

#1725 ezhang887 opened 10 months ago
0
torch2.0.1 No module named 'torch._six

#1724 darrenwang00 opened 10 months ago
11
Distributed optimizer infrastructure for FP8 parameters

#1723 timmoon10 closed 9 months ago
0
Apex is not correctly built for pytorch 2.1.0

#1722 acphile opened 10 months ago
2
Distributed optimizer support for multiple dtypes

#1721 timmoon10 closed 10 months ago
0
[contrib.xentropy] bfloat16 support

#1720 crcrpar closed 10 months ago
0
Return distributed optimizer checkpoint on all ranks

#1719 timmoon10 closed 10 months ago
0
Adjusting test for ONNX opset 18 (now default)

#1718 borisfom closed 10 months ago
0
ModuleNotFoundError: No module named 'fast_multihead_attn'

#1717 ICENacl opened 10 months ago
4

Previous Next