issues
search
NVIDIA
/
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.59k
stars
255
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[PyTorch] Support dtype casting in fused adam
#977
Wong4j
opened
17 minutes ago
0
Get Stuck at Building Wheel
#976
kingformatty
opened
3 days ago
1
Update FE to 1.5.2 and miscellaneous fixes
#975
cyanguwa
opened
4 days ago
5
Add test for building without support for any DL frameworks
#974
timmoon10
opened
4 days ago
1
[PyTorch] Disable THD tests on architectures lower than sm90
#973
cyanguwa
closed
4 days ago
1
no boost in performance with Ada GPU
#972
saurabh-kataria
opened
4 days ago
0
[PyTorch] Disable THD test on architectures lower than sm90
#971
cyanguwa
closed
4 days ago
2
[PyTorch] Runtime lookup for CUDA Driver API calls in Userbuffers
#970
denera
opened
4 days ago
9
Script to run pre-commit hooks locally
#969
ksivaman
closed
4 days ago
0
[PyTorch] Fix invalid import in test for context parallelism
#968
timmoon10
closed
5 days ago
0
Replace functools cache with lru_cache
#967
timmoon10
closed
4 days ago
1
tp_overlap need tensor parallel is equal world size ?
#966
kuangdao
opened
5 days ago
2
How to cast 16/32-bit to FP8?
#965
mxjmtxrm
opened
5 days ago
3
[JAX] Add experimental internal used THD(packed) fused attn API
#964
zlsh80826
opened
5 days ago
2
[Paddle] Fix forward and backward logic of te.Linear(parallel_mode='column') to adapt DiT of PaddleMIX
#963
yumin066
opened
5 days ago
3
nan loss when training in fp8 with rotary embedding
#962
saurabh-kataria
opened
6 days ago
2
Why is the result of context-parallel DotProductAttention influenced by the random seed?
#961
LitPrice
opened
6 days ago
0
[C/PyTorch] Add support for bottom-right-diagonal causal mask
#960
cyanguwa
opened
6 days ago
0
create_communicator_grouped2 may trigger uninit value memory issue(randomly crash) when you train more iterations.
#959
anderson101866
opened
6 days ago
1
TransformerEngine setup.py fails with Python 3.8
#958
skydoorkai
closed
4 days ago
2
[Paddle][CUDAGraph] 175B GPT-3 Hybrid-Parallel Training with CUDAGraph
#957
eee4017
opened
1 week ago
4
[Paddle] Add deterministic option in DotProductAttention
#956
Wong4j
opened
1 week ago
5
AssertionError: CublasLt version 12.1.3.x or higher required for FP8 execution on Ada.
#955
saurabh-kataria
closed
6 days ago
2
TransformerEngine build fail with Conda
#954
TeddLi
closed
6 days ago
4
NaN loss issues when I switch to the Transformer Engine TransformerLayer from pytorch layer
#953
jasonkrone
opened
1 week ago
0
AttnFuncWithCP can use less memory
#952
i4never
opened
1 week ago
0
Lower memory usage during AttnFuncWithCP.forward
#951
i4never
opened
1 week ago
2
Pure bfloat16 vs. mixed precision bfloat16: what's recommended?
#950
jasonkrone
closed
5 days ago
1
Fix compilation bug with CUDA 12.1
#949
Edenzzzz
closed
5 days ago
2
how to use FusedRMSNorm?
#948
EthanChen1234
opened
1 week ago
1
Why use two streams for context parallel
#947
Edenzzzz
opened
1 week ago
2
[TE/JAX] Prototype for New XLA Custom Calls with FFI
#946
phu0ngng
opened
1 week ago
0
[PyTorch] Add option to pass kwargs to CUDA graph module
#945
timmoon10
opened
1 week ago
0
Expose `rotary_base` as an arg instead of hardcoding
#944
sudhakarsingh27
opened
1 week ago
1
Update required CMake version to 3.25
#943
timmoon10
opened
1 week ago
2
Improve JAX build tool
#942
phu0ngng
closed
6 days ago
2
backward fails after updating to main branch TE
#941
1049451037
opened
1 week ago
0
[PyTorch] Remove unnecessary check for UB support
#940
timmoon10
closed
1 week ago
1
[PyTorch] Fix tp_group_initialized error
#939
cyanguwa
closed
1 week ago
1
[PyTorch] Release GIL in PyTorch extensions
#938
timmoon10
closed
1 week ago
2
[JAX] Fixing `unused-variable` warning at TE/JAX extension compile
#937
denera
closed
1 week ago
1
[MoE][Common/PyTorch] Add permutation
#936
StudyingShao
opened
1 week ago
3
[BUG] Assertion failed: t.data.dptr != nullptr. Input x is not allocated!
#935
alexdremov
opened
1 week ago
1
'TEDotProductAttention' object has no attribute 'tp_group_initialized'
#934
1049451037
opened
1 week ago
2
Can't install fatal error: <path-to-conda-env>/lib/python3.8/site-packages/torch/include/ATen/ops/argmax.h: No such file or directory
#933
saurabh-kataria
closed
6 days ago
2
Remove leftover implementations for optional userbuffers support
#932
ksivaman
closed
1 week ago
0
[Common] Remove CheckTensor if the workspace is empty in cast_transpose_fused
#931
phu0ngng
closed
2 weeks ago
2
How to install with CuDNN 9.0+ ?
#930
tianyan01
opened
2 weeks ago
2
Apply formatting
#929
ksivaman
closed
2 weeks ago
0
installation failed due to demand of old flash attention
#928
saurabh-kataria
closed
2 weeks ago
0
Next