-
### System Info
```Shell
- `Accelerate` version: 0.34.2
- Platform: Linux-5.4.0-45-generic-x86_64-with-glibc2.31
- `accelerate` bash location: /home/gradevski/miniconda3/envs/summary_explainer_p…
-
哈喽作者,感谢您的分享!在浏览您的论文时注意到attention map的可视化,想请教下类似这类注意力图的思路或者参考代码。感谢您的回复!
![d868f961d8b4fd09e2a70aca4ee9951](https://github.com/user-attachments/assets/a2f578f3-0628-48d9-87ed-6bc6801d694f)
-
# ❓ Questions and Help
Hi All,
Debian 13
python3.10.12 venv
PyTorch2.4.1_rocm
When I try and compile xformers against Pytorch2.4.1_rocm I am ending up with the common "no file found at /th…
-
Can this project help for you? https://github.com/philipturner/metal-flash-attention
So far, metal-flash-attention can indeed provide the fastest generation speed for stable diffusion on MacOS.
-
### Description
I am trying to fine-tune Gemma 2 on TPU and got the following error:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/jax/_src/compiler.py", l…
-
-
Dear quanto folks,
I implemented quantization as suggested in your coding example [quantize_sst2_model.py](https://github.com/huggingface/optimum-quanto/blob/main/examples/nlp/text-classification/s…
-
TRT-LLM version: v0.11.0
I'm deploying a bart model with medusa heads, and i notice this issue https://github.com/NVIDIA/TensorRT-LLM/issues/1946, then i adapted my model with follow steps:
```
1…
-
I'm aware that you plan to add compatibility slowly, but I just wanted to bring [Arts and Crafts](https://modrinth.com/mod/artsandcrafts) to your attention for eventual compatibility.
-
### 🚀 The feature, motivation and pitch
Flash Attention 3 (https://github.com/Dao-AILab/flash-attention) has been in beta for some time. I tested it on H100 GPUs with CUDA 12.3 and also attempted a…