NVIDIA apex issues - Githubissues

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD 3-Clause "New" or "Revised" License

8.17k stars 1.35k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Include format version in distopt checkpoints

#1716 timmoon10 closed 10 months ago
0
Massively reduce LayerNorm/RMSNorm GPU memory usage in modern networks by tricking torch autograd

#1715 RuiWang1998 closed 9 months ago
11
Add the warning of distributed_fused_adam low bucket usage

#1714 shjwudp closed 10 months ago
1
Update GroupNorm for 16 Groups

#1713 alpha0422 closed 10 months ago
1
Apex Tensor Parallelism and LoRA

#1712 conceptofmind closed 1 week ago
4
[Transformer][Test] Skip UccP2PCommTest on single GPU

#1711 Aidyn-A closed 10 months ago
0
AttributeError: module 'apex.amp' has no attribute 'state_dict'

#1710 caoren-shuai opened 10 months ago
2
Is there any doc about fmha

#1709 wukong1992 opened 10 months ago
0
Failed to install.

#1708 pengsl-lab opened 11 months ago
2
Scale optimizer state with updated distributed size

#1707 jayakrishnaanvesh closed 10 months ago
0
The storage format of the compressed matrix in module 'apex.contrib.sparsity'

#1706 Shan2L opened 11 months ago
1
Print the TORCH_CUDA_ARCH_LIST

#1705 yncxcw opened 11 months ago
1
DP-independent checkpoint format for distributed Adam optimizer

#1704 timmoon10 closed 10 months ago
1
Apex installation is stucked in infinite loop with printing warnings

#1703 GalJakob opened 11 months ago
1
Fail to install apex

#1702 DIY-Z closed 11 months ago
5
`apex.contrib.group_norm` would better have an import guard of `group_norm_cuda`

#1701 crcrpar opened 11 months ago
1
raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: ':/usr/local/cuda/bin/nvcc'

#1700 HSC472 opened 11 months ago
1
Add type hints to distributed Adam optimizer

#1699 timmoon10 closed 11 months ago
0
Make distributed fused lamb test names friendly to keyword filtering

#1698 crcrpar opened 11 months ago
0
Fail to install apex: TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

#1697 yu1679959321 closed 11 months ago
2
Bf16lamb

#1696 yuanzhedong closed 12 months ago
0
Fast CUDA NHWC Group Norm

#1695 alpha0422 closed 11 months ago
5
Using nvidia_dlprof_pytorch_nvtx.init() with apex errors out as "ModuleNotFoundError: No module named 'xentropy_cuda' "

#1694 nipunagarwala opened 1 year ago
1
Use `torch.testing.assert_close` in test_index_mul_2d.py

#1693 crcrpar closed 11 months ago
0
Add custom build backend to support build args

#1692 janEbert opened 1 year ago
3
[Transformer][UCC] Fix async p2p ops

#1691 Aidyn-A closed 1 year ago
0
Fix installation command

#1690 janEbert closed 1 year ago
2
Use a modern tensor constructor in cudnn_gbn

#1689 crcrpar opened 1 year ago
0
A FasterRMSNorm implementation (based on FasterLayerNorm)

#1688 Njuapp opened 1 year ago
0
data_file = open("myways.json","r") data = json.loads(data_file.read()) print(data['intents']) KeyError Traceback (most recent call last) Cell In[72], line 3 1 data_file = open("myways.json","r") 2 data = json.loads(data_file.read()) ----> 3 print(data['intents']) KeyError: 'intents' This key error is coming though I have created a json file with intents as an object

#1687 PushkarSri opened 1 year ago
1
sequence parallel with rmsnorm/layernorm

#1686 wlike opened 1 year ago
0
Tkurth/sgbn fixes

#1685 azrael417 closed 1 year ago
3
Tkurth/mplamb fixed

#1684 azrael417 closed 1 year ago
0
Backprop through TransducerLoss creates NaN gradients

#1683 TheoEhrenborg opened 1 year ago
0
ERROR: Could not build wheels for apex, which is required to install pyproject.toml-based projects

#1682 PeytonTse opened 1 year ago
4
ERROR: Directory './' is not installable. Neither 'setup.py' nor 'pyproject.toml' found.

#1681 abbas695 closed 1 year ago
1
Updating missing build dependency in pyproject.toml

#1680 loadams opened 1 year ago
5
`pyproject.toml` missing `packaging` dependency

#1679 calebho opened 1 year ago
46
Tkurth/new gbn

#1678 azrael417 closed 1 year ago
0
scaled_upper_triang_masked_softmax_cuda: undefined symbol

#1677 TheGravityZero opened 1 year ago
1
Issue Installing Apex in WSL Environment

#1676 l8g opened 1 year ago
5
[Transformer] Do not use batch_isend_irecv for UCC

#1675 Aidyn-A closed 1 year ago
0
I might have some pip issue while running autogpt in vs code

#1674 KTH1881 closed 1 year ago
0
[Test][Transformer] Pre-parse container version

#1673 Aidyn-A closed 1 year ago
1
current code cannot build due to tensor.type()

#1672 ycsos closed 1 year ago
0
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

#1671 HloveMM opened 1 year ago
1
bf16 support for FusedDense preventing apex build on CUDA 10.2

#1670 minostauros opened 1 year ago
6
Add `pyproject.toml`

#1669 crcrpar closed 1 year ago
0
Please publish versions tags to Github

#1668 h-vetinari opened 1 year ago
1
Update setup.py

#1667 RedaGrace closed 1 year ago
1

Previous Next