NVIDIA TransformerEngine issues

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html

Apache License 2.0

1.61k stars 256 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Upgrade pytest version

#891 ksivaman closed 4 weeks ago
2
Passing context_fn to torch.utils.checkpoint results in errors when using torch.compile

#890 MaciejBalaNV closed 3 weeks ago
5
Add documentation for dot product attention

#889 cyanguwa closed 3 weeks ago
9
Get CMake bin dir from CMake module if possible

#888 timmoon10 closed 4 weeks ago
2
Error in installing

#887 ziyang-arch closed 2 weeks ago
6
Use unoptimized RMSNorm kernel if pointers are not aligned

#886 timmoon10 closed 3 weeks ago
2
[PyTorch] Add support for cuDNN FusedAttention + THD + CP

#885 xrennvidia closed 3 weeks ago
4
[Common] Fused cast transpose kernels refactoring

#884 Oleg-Goncharov closed 4 weeks ago
5
[JAX] Splitting `csrc/modules.cpp` by category

#883 phu0ngng closed 3 weeks ago
3
[PyTorch] Replace `int8_t` in Pybind11 extensions with `int64_t`

#882 timmoon10 closed 1 month ago
1
Failed to build Transformer Engine

#881 zirui closed 1 month ago
2
Fp8 model init factory

#880 sudhakarsingh27 opened 1 month ago
3
Can't find `nvToolsExt` during build

#879 kvablack opened 1 month ago
1
[JAX] Added unit tests for distributed LayernormMLP

#878 phu0ngng closed 3 weeks ago
1
Build system refactor for wheels

#877 ksivaman closed 4 weeks ago
2
New NVIDIA footer in documentation

#876 ptrendx closed 1 month ago
0
[PyTorch] Make sure RoPE frequencies are in FP32

#875 timmoon10 closed 1 month ago
1
Add user to TE CI

#874 timmoon10 closed 1 month ago
0
Ubuntu session close during building wheel step

#873 Ciclarion closed 1 month ago
3
import transformer_engine initializes CUDA

#872 szmigacz opened 1 month ago
1
Strange behavior when import torch after import te.

#871 GGGGGGXY opened 1 month ago
1
TypeError: UbufP2PCommOverlap(): incompatible function arguments.

#870 holmes313 closed 1 month ago
1
[PyTorch] Add CUDA graph tests with FP8 weight caching

#869 timmoon10 closed 1 month ago
2
Release GIL when calling C extensions

#868 szmigacz closed 2 weeks ago
0
[PyTorch] Move FusedAdam/FusedSGD and necessary kernels from Apex to TE

#867 yaox12 closed 1 month ago
3
Port down up may cause hang when using TE in training.

#866 holmes313 opened 1 month ago
0
[PyTorch] Avoid select op in PyTorch extensions

#865 timmoon10 closed 3 weeks ago
4
[URGENT] Malware hosted somewhere in this repo

#864 andrei-cb closed 1 month ago
5
[C] Allow bias support for sm80/86/89 for cuDNN 9+

#863 cyanguwa closed 1 month ago
1
Avoid framework specific import from top level

#862 ksivaman opened 1 month ago
0
[PyTorch] Handle non-constant FP8 scales in ONNX export

#861 timmoon10 closed 1 month ago
2
[PyTorch] Replaced deprecated `pkg_resources` with `packaging`

#860 denera closed 1 month ago
0
[JAX] Fixed the shape miss-matching issue in MLP.

#859 mingxu1067 closed 1 month ago
2
[C/PyTorch/JAX] Build system improvements for rpath and C++11 ABI

#858 denera closed 4 weeks ago
10
ERROR: Failed building wheel for transformer-engine

#857 Weifan1226 closed 1 month ago
4
Cannot import and use transformer_engine after successful installation with No module named 'transformer_engine_extensions'

#856 sam-h-bean opened 1 month ago
4
Deflect pip queries when querying .so location

#855 akoumpa closed 1 month ago
3
[PyTorch] FP8 AllToAll

#854 yaox12 closed 1 month ago
1
[Common/PyTorch] Grouped GEMM via multi-stream cuBLAS

#853 yaox12 closed 1 week ago
7
Use correct FP8 group in multi-GPU docs

#852 timmoon10 closed 1 month ago
0
Revert "Import framework submodules lazily (#839)"

#851 ksivaman closed 1 month ago
0
Revert "Import framework submodules lazily"

#850 ksivaman closed 1 month ago
0
`inv_freq` of `RotaryPositionEmbedding` is hard-coded to 10k

#849 shijie-wu opened 1 month ago
1
[JAX] Fix the Failures on Partition of ActPrimitives

#848 mingxu1067 closed 1 month ago
2
[ERROR] cuBLAS error when launch training with Megatron-LM and TransformerEngine

#847 Btlmd closed 1 month ago
2
[Pytorch] Added squared ReLU implementation

#846 phu0ngng closed 1 month ago
1
[Common] Added Alignment Requirements for CuBLAS heuristics

#845 phu0ngng closed 1 month ago
5
Do not attempt importing submodules if framework is not available

#844 timmoon10 closed 1 month ago
0
[JAX] [B] Fixed Batcher in DBiasCastTranspose Primitive

#843 phu0ngng closed 1 month ago
1
[JAX] Rewrite the Format of FP8 Meta and Remove unused ShardingTypes.

#842 mingxu1067 closed 3 weeks ago
10

Previous Next