-
https://arxiv.org/abs/2009.06732
-
1.Public code and paper link:
I have installed the following code: [https://github.com/AILab-CVC/GroupMixFormer](url)
paper link : [https://arxiv.org/abs/2311.15157](url)
2. What does this work d…
-
Great work!
Currently, I am reproducing this work. I found that the `LlamaForCausalLM` used in the repository is out of date, and its memory cost is much higher than the `LlamaForCausalLM` from Hug…
-
### Is your feature request related to a problem?
The sequential table transformer (#802) is great if later transformations depend on prior ones. Often, however, columns are transformed independently…
-
Platforms: linux
This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_mem_efficient_attention_attn_mask_vs_math_ref_grads_batch_size_1…
-
Hi @AlexeyAB,
Can we have this [DeiT](https://github.com/facebookresearch/deit)?
Thanks
-
When attempting to manually deploy the model to sagemaker via a deployment script or automatically deploying the model via the huggingface inference endpoints UI, I receive the same error:
"ValueEr…
-
### 🐛 Describe the bug
Looks like it's dispatching to efficient attention backward and failing one of the shape checks (
```
TORCH_CHECK(
max_seqlen_k
-
The finetuning of Qwen2-57B-A14B-Instruct is extremely slow compared to finetuning of Qwen2-72B-Instruct.
Here are the runtimes:
**Qwen/Qwen2-7B-Instruct:**
{'train_runtime': 100.8509, 'trai…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues and checked the recent builds/commits
### What happened?
xformers is installed and available in my conda env yet n…