issues
search
NVIDIA
/
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.6k
stars
255
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Can't install fatal error: <path-to-conda-env>/lib/python3.8/site-packages/torch/include/ATen/ops/argmax.h: No such file or directory
#933
saurabh-kataria
closed
1 week ago
2
Remove leftover implementations for optional userbuffers support
#932
ksivaman
closed
2 weeks ago
0
[Common] Remove CheckTensor if the workspace is empty in cast_transpose_fused
#931
phu0ngng
closed
2 weeks ago
2
How to install with CuDNN 9.0+ ?
#930
tianyan01
opened
2 weeks ago
2
Apply formatting
#929
ksivaman
closed
2 weeks ago
0
installation failed due to demand of old flash attention
#928
saurabh-kataria
closed
2 weeks ago
0
[C/PyTorch] Simplify THD offset tensors
#927
cyanguwa
closed
2 weeks ago
10
A hot fix to disable CE deadlock check
#926
shamisp
closed
2 weeks ago
3
Revert PR757
#925
vasunvidia
closed
2 weeks ago
0
Add more type of change to the PR template
#924
phu0ngng
closed
2 weeks ago
0
[Feature Request][PyTorch] Support thd format for fp8 tensors in DotProductAttention
#923
alexdremov
opened
2 weeks ago
0
How to use FP8 of TransformerEngine in inference
#922
Godlovecui
opened
2 weeks ago
2
[PyTorch] reverting autocast API back to PyTorch v2.3.1 and below
#921
denera
closed
2 weeks ago
0
[Draft] Zero fwd and bwd results for THD+CP
#920
xrennvidia
opened
2 weeks ago
0
Add auto-formatter
#919
ksivaman
closed
2 weeks ago
1
Compiling on Slurmcluster fatal error: cudnn.h: No such file or directory
#918
windprak
opened
2 weeks ago
3
[PyTorch] Adjust checkpointing of FP8 metadata for attention
#917
cyanguwa
closed
2 weeks ago
3
[PyTorch] Fixed assert on primary Fp8 weights in `prepare_te_modules_for_fsdp()`
#916
denera
closed
2 weeks ago
1
remove code duplication in a test
#915
rybakov
closed
2 weeks ago
1
Add the option to use SM for P2P comm in TP overlap
#914
erhoo82
closed
2 weeks ago
4
Fix TE assert weight error
#913
j316chuck
closed
2 weeks ago
2
Value initialize packing descriptors
#912
keshavb96
closed
3 weeks ago
1
Fix local cpp tests after inplace build
#911
ksivaman
closed
2 weeks ago
1
Value initialize all descriptors
#910
keshavb96
closed
3 weeks ago
0
The precision is not aligned by the index
#909
Amanda-Barbara
opened
3 weeks ago
0
What is padding_causal?
#908
1049451037
opened
3 weeks ago
0
[PyTorch] Expose `multi_tensor_*` kernels
#907
yaox12
closed
2 weeks ago
4
Undefined symbols when installed with public PyTorch (C++11 ABI issue)
#906
borisfom
closed
2 weeks ago
2
disable using nvfuser when pytorch version >= 2.2
#905
sudhakarsingh27
closed
2 weeks ago
2
remove code duplication in test_onnx_export.py
#904
rybakov
closed
3 weeks ago
0
[Common] Added JIT-compiled fused cast transpose kernels
#903
Oleg-Goncharov
closed
2 weeks ago
6
[JAX] Made order of gated act consistent in all branches
#902
phu0ngng
closed
3 weeks ago
1
[C/PyTorch] Removed MPI dependence in Userbuffers
#901
denera
closed
2 weeks ago
2
[Feature Request] Integration of DiT components into TransformerEngine.
#900
okotaku
opened
3 weeks ago
0
[JAX] Splitting cpp_extensions.py
#899
phu0ngng
closed
2 weeks ago
4
Fix minor security vulnerability when triggering CI
#898
timmoon10
closed
3 weeks ago
0
Change `norm_factor` into `softmax_scale` and add kwarg into `DotProductAttention `
#897
BoxiangW
closed
2 weeks ago
8
Make transformer_engine::getenv arguments independent of C++ ABI version
#896
ksivaman
closed
3 weeks ago
3
[PyTorch] Check and set sliding window size based on attention mask type
#895
cyanguwa
closed
10 hours ago
4
[PyTorch] Disabling TorchDynamo for TE activation checkpoint wrapper
#894
denera
closed
2 weeks ago
1
[PaddlePaddle] Fix editable build for paddle
#893
ksivaman
closed
3 weeks ago
1
Remove interval arg from recipe
#892
ksivaman
closed
3 weeks ago
1
Upgrade pytest version
#891
ksivaman
closed
3 weeks ago
2
Passing context_fn to torch.utils.checkpoint results in errors when using torch.compile
#890
MaciejBalaNV
closed
2 weeks ago
5
Add documentation for dot product attention
#889
cyanguwa
closed
2 weeks ago
9
Get CMake bin dir from CMake module if possible
#888
timmoon10
closed
3 weeks ago
2
Error in installing
#887
ziyang-arch
closed
2 weeks ago
6
Use unoptimized RMSNorm kernel if pointers are not aligned
#886
timmoon10
closed
2 weeks ago
2
[PyTorch] Add support for cuDNN FusedAttention + THD + CP
#885
xrennvidia
closed
3 weeks ago
4
[Common] Fused cast transpose kernels refactoring
#884
Oleg-Goncharov
closed
3 weeks ago
5
Previous
Next