issues
search
NVIDIA
/
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k
stars
256
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Upgrade pytest version
#891
ksivaman
closed
4 weeks ago
2
Passing context_fn to torch.utils.checkpoint results in errors when using torch.compile
#890
MaciejBalaNV
closed
3 weeks ago
5
Add documentation for dot product attention
#889
cyanguwa
closed
3 weeks ago
9
Get CMake bin dir from CMake module if possible
#888
timmoon10
closed
4 weeks ago
2
Error in installing
#887
ziyang-arch
closed
2 weeks ago
6
Use unoptimized RMSNorm kernel if pointers are not aligned
#886
timmoon10
closed
3 weeks ago
2
[PyTorch] Add support for cuDNN FusedAttention + THD + CP
#885
xrennvidia
closed
3 weeks ago
4
[Common] Fused cast transpose kernels refactoring
#884
Oleg-Goncharov
closed
4 weeks ago
5
[JAX] Splitting `csrc/modules.cpp` by category
#883
phu0ngng
closed
3 weeks ago
3
[PyTorch] Replace `int8_t` in Pybind11 extensions with `int64_t`
#882
timmoon10
closed
1 month ago
1
Failed to build Transformer Engine
#881
zirui
closed
1 month ago
2
Fp8 model init factory
#880
sudhakarsingh27
opened
1 month ago
3
Can't find `nvToolsExt` during build
#879
kvablack
opened
1 month ago
1
[JAX] Added unit tests for distributed LayernormMLP
#878
phu0ngng
closed
3 weeks ago
1
Build system refactor for wheels
#877
ksivaman
closed
4 weeks ago
2
New NVIDIA footer in documentation
#876
ptrendx
closed
1 month ago
0
[PyTorch] Make sure RoPE frequencies are in FP32
#875
timmoon10
closed
1 month ago
1
Add user to TE CI
#874
timmoon10
closed
1 month ago
0
Ubuntu session close during building wheel step
#873
Ciclarion
closed
1 month ago
3
import transformer_engine initializes CUDA
#872
szmigacz
opened
1 month ago
1
Strange behavior when import torch after import te.
#871
GGGGGGXY
opened
1 month ago
1
TypeError: UbufP2PCommOverlap(): incompatible function arguments.
#870
holmes313
closed
1 month ago
1
[PyTorch] Add CUDA graph tests with FP8 weight caching
#869
timmoon10
closed
1 month ago
2
Release GIL when calling C extensions
#868
szmigacz
closed
2 weeks ago
0
[PyTorch] Move FusedAdam/FusedSGD and necessary kernels from Apex to TE
#867
yaox12
closed
1 month ago
3
Port down up may cause hang when using TE in training.
#866
holmes313
opened
1 month ago
0
[PyTorch] Avoid select op in PyTorch extensions
#865
timmoon10
closed
3 weeks ago
4
[URGENT] Malware hosted somewhere in this repo
#864
andrei-cb
closed
1 month ago
5
[C] Allow bias support for sm80/86/89 for cuDNN 9+
#863
cyanguwa
closed
1 month ago
1
Avoid framework specific import from top level
#862
ksivaman
opened
1 month ago
0
[PyTorch] Handle non-constant FP8 scales in ONNX export
#861
timmoon10
closed
1 month ago
2
[PyTorch] Replaced deprecated `pkg_resources` with `packaging`
#860
denera
closed
1 month ago
0
[JAX] Fixed the shape miss-matching issue in MLP.
#859
mingxu1067
closed
1 month ago
2
[C/PyTorch/JAX] Build system improvements for rpath and C++11 ABI
#858
denera
closed
4 weeks ago
10
ERROR: Failed building wheel for transformer-engine
#857
Weifan1226
closed
1 month ago
4
Cannot import and use transformer_engine after successful installation with No module named 'transformer_engine_extensions'
#856
sam-h-bean
opened
1 month ago
4
Deflect pip queries when querying .so location
#855
akoumpa
closed
1 month ago
3
[PyTorch] FP8 AllToAll
#854
yaox12
closed
1 month ago
1
[Common/PyTorch] Grouped GEMM via multi-stream cuBLAS
#853
yaox12
closed
1 week ago
7
Use correct FP8 group in multi-GPU docs
#852
timmoon10
closed
1 month ago
0
Revert "Import framework submodules lazily (#839)"
#851
ksivaman
closed
1 month ago
0
Revert "Import framework submodules lazily"
#850
ksivaman
closed
1 month ago
0
`inv_freq` of `RotaryPositionEmbedding` is hard-coded to 10k
#849
shijie-wu
opened
1 month ago
1
[JAX] Fix the Failures on Partition of ActPrimitives
#848
mingxu1067
closed
1 month ago
2
[ERROR] cuBLAS error when launch training with Megatron-LM and TransformerEngine
#847
Btlmd
closed
1 month ago
2
[Pytorch] Added squared ReLU implementation
#846
phu0ngng
closed
1 month ago
1
[Common] Added Alignment Requirements for CuBLAS heuristics
#845
phu0ngng
closed
1 month ago
5
Do not attempt importing submodules if framework is not available
#844
timmoon10
closed
1 month ago
0
[JAX] [B] Fixed Batcher in DBiasCastTranspose Primitive
#843
phu0ngng
closed
1 month ago
1
[JAX] Rewrite the Format of FP8 Meta and Remove unused ShardingTypes.
#842
mingxu1067
closed
3 weeks ago
10
Previous
Next