-
**Is your feature request related to a problem? Please describe.**
Composite version has a lot of calls to the slwo binary ops:
**Describe the solution you'd like**
Have one op call for this op…
-
### Describe the bug
I am trying to optimize the performance of https://triton-lang.org/main/getting-started/tutorials/08-grouped-gemm.html by enabling better pipelining. Currently, it has 3 nested…
-
Hi team,
we are currently adapting our training environment to use the fused attention functions. In one of our training setups, we work with batch size one and concaternate multiple documents along …
-
Write the fused image to a 3D tiff per channel, or a directory of 2D tiffs per channel
-
Is there any research articles that can explain the theory behind fused_dense?
-
**Feature request:** Add a [fused multiply-add](https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply%E2%80%93add) function. It should use a hardware implementation if ava…
-
系统版本:Ubuntu 22.04.4 LTS
安装方式:LInux构建
构建成功后运行:
```
root@proxy:/home/system_install_package/PaddleOCR-json/cpp# LD_LIBRARY_PATH=$LIBS ./build/bin/PaddleOCR-json -models_path="/home/system_install_pa…
-
Hello, on the line 155 of `gaussian_model.py`, should `features[:, 3:, 1:] = 0.0` be `features[:, :3, 1:] = 0.0` instead?
```
def create_from_pcd(self, pcd : BasicPointCloud, cam_infos : int, spatia…
-
# ❓ Questions and Help
Hello, I am watching fused multi-head attention in 3rdparty/cutlass.
In cutlass/examples, fused multi head attention is upstream to xformers.
And CUTLASS said fused multi h…
-
The environment is as follows:
python 3.10.12
flash_attn 2.6.3
torch 2.4.1
apex ##source build; from https://github.com/NVIDIA/apex
cuda-12.1
transformers 4.44.2
When I executed the foll…