issues
search
NVIDIA
/
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k
stars
1.4k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[contrib] Improve FusedAdamSWA interface and add unit tests
#1759
lirundong
closed
11 months ago
1
add async copy for openfold swa triton kernel
#1758
azazhu
closed
11 months ago
0
No module named 'amp_C' error for py3.9 pytorch2.1.0 cuda12.1
#1757
rocke2020
closed
11 months ago
1
Fused RoPE for `thd` format
#1756
yaox12
closed
10 months ago
1
ModuleNotFoundError: No module named 'fused_layer_norm_cuda', ubuntu 22.04, Successfully installed apex-0.1
#1755
dhamaraiselvi
opened
12 months ago
3
Use recommended PyTorch methods to silence warnings
#1754
deepakn94
closed
12 months ago
0
why a kernel like CUDAFunctor_add appears when testing MixedFusedRMSNorm?
#1753
HangJie720
opened
12 months ago
0
[FusedRoPE] Fuse type conversion and cos/sin
#1752
yaox12
closed
12 months ago
1
Avoid `.contiguous()` in fused RoPE
#1751
yaox12
closed
1 year ago
0
[Bug] Fix a bug in fused rope
#1750
yaox12
closed
1 year ago
0
Distributed optimizer support for contiguous param buffer with FP8 params
#1749
timmoon10
closed
1 year ago
1
Whether to support Cuda 12.1
#1748
yangzhipeng1108
opened
1 year ago
7
Misc Changes
#1747
nWEIdia
closed
1 year ago
1
A fused `apply_rotary_pos_emb` implementation for Megatron-Core
#1746
yaox12
closed
1 year ago
0
More Precision Combinations For GroupNorm
#1745
alpha0422
closed
1 year ago
0
GPU memory leak with Flair and APEX
#1744
astropic
opened
1 year ago
0
Fix `rtol` in `assert_close` cleanup
#1743
eqy
closed
1 year ago
0
Cleanup usage of `self.assertTrue(torch.allclose(...`
#1742
eqy
closed
1 year ago
0
ninja: error: '/app/csrc/amp_C_frontend.cpp', needed by '/app/build/temp.linux-x86_64-cpython-310/csrc/amp_C_frontend.o', missing and no known rule to make it
#1741
Tolga-Karahan
closed
1 year ago
0
Loop through all available engines for cuDNN heuristics search
#1740
minitu
closed
1 year ago
1
add test for openfold triton mha kernel
#1739
azazhu
closed
1 year ago
0
error: command '/usr/local/cuda-11.3/bin/nvcc' failed with exit code 1
#1738
Brion112233
opened
1 year ago
1
When doing pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./, shows ModuleNotFoundError: No module named 'packaging'
#1737
lainmn
opened
1 year ago
8
fused_layer_norm_cuda.rms_forward_affine gives runtime error when run on cuda:1
#1736
Kushdesh
opened
1 year ago
0
Installation fails (due to recent change?)
#1735
hector-gr
opened
1 year ago
14
Add openfold triton code
#1734
ar-nowaczynski
closed
1 year ago
0
Add hysteresis support for AMP gradient scale update
#1733
minitu
closed
1 year ago
1
Rui/dev fast ln
#1732
RuiWang1998
closed
1 year ago
0
Use master weights for bfloat16 FusedAdam when master_weights=True
#1731
cbcase
opened
1 year ago
2
is it possible to update `conda-forge/nvidia-apex` to a recent tag?
#1730
stas00
closed
1 year ago
2
torch1.13.1 cuda11.6 python3.8 TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
#1729
yuhuai4554
opened
1 year ago
2
FusedAdam doesn't allocate master weights for bfloat16
#1728
cbcase
opened
1 year ago
2
Add multi_tensor_unscale_l2norm_cuda
#1727
minitu
closed
1 year ago
1
Rui/dev fast ln
#1726
RuiWang1998
closed
1 year ago
0
Option to only build `amp_C` module
#1725
ezhang887
opened
1 year ago
0
torch2.0.1 No module named 'torch._six
#1724
darrenwang00
opened
1 year ago
13
Distributed optimizer infrastructure for FP8 parameters
#1723
timmoon10
closed
1 year ago
0
Apex is not correctly built for pytorch 2.1.0
#1722
acphile
opened
1 year ago
2
Distributed optimizer support for multiple dtypes
#1721
timmoon10
closed
1 year ago
0
[contrib.xentropy] bfloat16 support
#1720
crcrpar
closed
1 year ago
0
Return distributed optimizer checkpoint on all ranks
#1719
timmoon10
closed
1 year ago
0
Adjusting test for ONNX opset 18 (now default)
#1718
borisfom
closed
1 year ago
0
ModuleNotFoundError: No module named 'fast_multihead_attn'
#1717
ICENacl
opened
1 year ago
4
Include format version in distopt checkpoints
#1716
timmoon10
closed
1 year ago
0
Massively reduce LayerNorm/RMSNorm GPU memory usage in modern networks by tricking torch autograd
#1715
RuiWang1998
closed
1 year ago
11
Add the warning of distributed_fused_adam low bucket usage
#1714
shjwudp
closed
1 year ago
1
Update GroupNorm for 16 Groups
#1713
alpha0422
closed
1 year ago
1
Apex Tensor Parallelism and LoRA
#1712
conceptofmind
closed
4 months ago
4
[Transformer][Test] Skip UccP2PCommTest on single GPU
#1711
Aidyn-A
closed
1 year ago
0
AttributeError: module 'apex.amp' has no attribute 'state_dict'
#1710
caoren-shuai
opened
1 year ago
2
Previous
Next