-
Fuse some popular functions and automatically replace modules in an existing 🤗 transformers model with their corresponding fusion module
**APIs**
```
from pipegoose.nn import fusion
# and ot…
-
### 🚀 The feature, motivation and pitch
**Context**:
While using [float8 training](https://fburl.com/4mblb81i), the operators of `fp8 = cast_to_fp8(input_tensor); fp8_t = fp8.t().contiguous().t()`…
-
For transformer models with small to medium-sized gemms, the advantages of using fp8 cublasLt gemms may be overshadowed by the additional computational overhead introduced by memory loads in the quant…
-
## 🚀 Feature
CuDNN provides flexible support for performant gemm/conv with fp8 quantization. Thunder introducing fp8 casts in its traces can benefit from cudnn fusions.
### Motivation
Today, thu…
-
### Motivation and description
Wondering what kind of speedup can be achieved by writing GPU kernels for optimizers.
Take a look at @pxl-th's implementation of Adam below
https://github.com/Jul…
-
This issue is for tracking the performance of the prognostic implicit edmf performance.
```[tasklist]
### Tasks
- [ ] Make a reproducer for the kernels discussed here: https://github.com/CliMA/Clim…
-
## CVE-2020-12652 - Medium Severity Vulnerability
Vulnerable Library - linuxlinux-4.19.30
Apache Software Foundation (ASF)
Library home page: https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/?…
-
## CVE-2020-12652 - Medium Severity Vulnerability
Vulnerable Library - linuxlinux-4.19.313
The Linux Kernel
Library home page: https://mirrors.edge.kernel.org/pub/linux/kernel/v4.x/?wsslib=linux
F…
-
## Introduction
I am an engineer currently working on 3D model parallelism for transformers. When the tensor model parallelism (https://github.com/huggingface/transformers/pull/13726) is done, I am g…
-
FP16 and use_amp support