-
After updated tgi version to
ghcr.io/huggingface/text-generation-inference:latest-intel-cpu
The codegen test failed with the following 2 MODELs:
ise-uiuc/Magicoder-S-DS-6.7B
m-a-p/OpenCodeInterpr…
-
For background, see the [README](https://github.com/meedstrom/eva) for all the theory.
Current questions on the stats theory
### Re. the model for realtime guesses:
- [ ] What kind of model can it b…
-
In the inner loop of FlashAttention-2, each computation of O requires a computation of V. I adopted a different implementation approach. For each block Q, after calculating the complete attention scor…
-
# Prob and some fix
I'm using flash_attn==2.3.3 to load my finetuned LLaMa2 model (13B), but get an error when using the Flash_attn. In /flash_attn/bert_padding.py#L41 there is an error : IndexError:…
-
### System Info
Transformers version 4.41.2
Platform: Ubuntu 22.04.4 LTS
Python: 3.10.14
### Who can help?
@younesbelkada @ArthurZucker
### Information
- [ ] The official example s…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
binary
### TensorFlow version
tf 2.17.0
### Custom code
No
### OS platform and distribution
Ubuntu…
-
I am wondering what's the best way to use efficient implementations of attention. PyTorch provides the experimental [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/stable…
-
## ❓ Questions and Help
Here's my system: docker image with gpu support ubuntu 18.04
```
(base) root@43a59b70d445:/app/scene-graph-benchmark# nvidia-smi
Thu Sep 21 11:57:45 2023
+-------…
-
I want to train a Target Speaker Extraction model on Librimix dataset, but I found the snr_loss and final loss(equal to snr_loss) are always 0.000e+00. Here is my train log:
node7:0/6] 2023-03-27 16:…
-
Hello DoWhy team.
Congrats on the great work on this package! I wonder if you would be interested in a contribution to the package. First, a brief intro.
In a series of papers with co-authors ([…