issues
search
NVIDIA
/
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.17k
stars
1.35k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Include format version in distopt checkpoints
#1716
timmoon10
closed
10 months ago
0
Massively reduce LayerNorm/RMSNorm GPU memory usage in modern networks by tricking torch autograd
#1715
RuiWang1998
closed
9 months ago
11
Add the warning of distributed_fused_adam low bucket usage
#1714
shjwudp
closed
10 months ago
1
Update GroupNorm for 16 Groups
#1713
alpha0422
closed
10 months ago
1
Apex Tensor Parallelism and LoRA
#1712
conceptofmind
closed
1 week ago
4
[Transformer][Test] Skip UccP2PCommTest on single GPU
#1711
Aidyn-A
closed
10 months ago
0
AttributeError: module 'apex.amp' has no attribute 'state_dict'
#1710
caoren-shuai
opened
10 months ago
2
Is there any doc about fmha
#1709
wukong1992
opened
10 months ago
0
Failed to install.
#1708
pengsl-lab
opened
11 months ago
2
Scale optimizer state with updated distributed size
#1707
jayakrishnaanvesh
closed
10 months ago
0
The storage format of the compressed matrix in module 'apex.contrib.sparsity'
#1706
Shan2L
opened
11 months ago
1
Print the TORCH_CUDA_ARCH_LIST
#1705
yncxcw
opened
11 months ago
1
DP-independent checkpoint format for distributed Adam optimizer
#1704
timmoon10
closed
10 months ago
1
Apex installation is stucked in infinite loop with printing warnings
#1703
GalJakob
opened
11 months ago
1
Fail to install apex
#1702
DIY-Z
closed
11 months ago
5
`apex.contrib.group_norm` would better have an import guard of `group_norm_cuda`
#1701
crcrpar
opened
11 months ago
1
raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc'
#1700
HSC472
opened
11 months ago
1
Add type hints to distributed Adam optimizer
#1699
timmoon10
closed
11 months ago
0
Make distributed fused lamb test names friendly to keyword filtering
#1698
crcrpar
opened
11 months ago
0
Fail to install apex: TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
#1697
yu1679959321
closed
11 months ago
2
Bf16lamb
#1696
yuanzhedong
closed
12 months ago
0
Fast CUDA NHWC Group Norm
#1695
alpha0422
closed
11 months ago
5
Using nvidia_dlprof_pytorch_nvtx.init() with apex errors out as "ModuleNotFoundError: No module named 'xentropy_cuda' "
#1694
nipunagarwala
opened
1 year ago
1
Use `torch.testing.assert_close` in test_index_mul_2d.py
#1693
crcrpar
closed
11 months ago
0
Add custom build backend to support build args
#1692
janEbert
opened
1 year ago
3
[Transformer][UCC] Fix async p2p ops
#1691
Aidyn-A
closed
1 year ago
0
Fix installation command
#1690
janEbert
closed
1 year ago
2
Use a modern tensor constructor in cudnn_gbn
#1689
crcrpar
opened
1 year ago
0
A FasterRMSNorm implementation (based on FasterLayerNorm)
#1688
Njuapp
opened
1 year ago
0
data_file = open("myways.json","r") data = json.loads(data_file.read()) print(data['intents']) KeyError Traceback (most recent call last) Cell In[72], line 3 1 data_file = open("myways.json","r") 2 data = json.loads(data_file.read()) ----> 3 print(data['intents']) KeyError: 'intents' This key error is coming though I have created a json file with intents as an object
#1687
PushkarSri
opened
1 year ago
1
sequence parallel with rmsnorm/layernorm
#1686
wlike
opened
1 year ago
0
Tkurth/sgbn fixes
#1685
azrael417
closed
1 year ago
3
Tkurth/mplamb fixed
#1684
azrael417
closed
1 year ago
0
Backprop through TransducerLoss creates NaN gradients
#1683
TheoEhrenborg
opened
1 year ago
0
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects
#1682
PeytonTse
opened
1 year ago
4
ERROR: Directory './' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.
#1681
abbas695
closed
1 year ago
1
Updating missing build dependency in pyproject.toml
#1680
loadams
opened
1 year ago
5
`pyproject.toml` missing `packaging` dependency
#1679
calebho
opened
1 year ago
46
Tkurth/new gbn
#1678
azrael417
closed
1 year ago
0
scaled_upper_triang_masked_softmax_cuda: undefined symbol
#1677
TheGravityZero
opened
1 year ago
1
Issue Installing Apex in WSL Environment
#1676
l8g
opened
1 year ago
5
[Transformer] Do not use batch_isend_irecv for UCC
#1675
Aidyn-A
closed
1 year ago
0
I might have some pip issue while running autogpt in vs code
#1674
KTH1881
closed
1 year ago
0
[Test][Transformer] Pre-parse container version
#1673
Aidyn-A
closed
1 year ago
1
current code cannot build due to tensor.type()
#1672
ycsos
closed
1 year ago
0
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'
#1671
HloveMM
opened
1 year ago
1
bf16 support for FusedDense preventing apex build on CUDA 10.2
#1670
minostauros
opened
1 year ago
6
Add `pyproject.toml`
#1669
crcrpar
closed
1 year ago
0
Please publish versions tags to Github
#1668
h-vetinari
opened
1 year ago
1
Update setup.py
#1667
RedaGrace
closed
1 year ago
1
Previous
Next