-
**Describe the bug**
gemma-2-9b-it-gptq-4bit CUDA OOM on RTX 3090
**GPU Info**
```
Sun Aug 4 02:35:35 2024
+-----------------------------------------------------------------------…
-
at @onefact we have been using wasm, but this won't work for the encoder-only or encoder-decoder models i've built (e.g. http://arxiv.org/abs/1904.05342). that's because the wasm vm is for the cpu (ha…
-
When training BERT with TF 2.3, the loss would decrease and `MLM_Acc` would be non-zero.
After upgrading to TF 2.4 and using the same script, the loss does not decrease and `MLM_Acc` remains 0.0
…
-
Attempting to use this library on a **gfx1030** (6800XT) with the huggingface transformers results in:
```
python -m bitsandbytes
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++…
-
I have been staging some updates testing the tgi-gaudi software with llama 405B fp8, i am waiting for habana optimum to approve the PR, and then I will submit a pr for huggingface/tgi_gaudi and will s…
-
Thank you for the great library.
I am calling the transformer from a c# backend which can run mutiple Python processes in parallel. This works fine with Spacy for example.
However, I am having …
-
Hello,
As describe on [PyTorch's blog](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) since version 1.12 it is possible to have significantly faster transfor…
-
Hi.
When I use the following commands in README:
```
CUDA_VISIBLE_DEVICES=0 python -m src.benchmark --num-data 1024 --strategy seqsch --vbs --fcr --lora-path ./ckpts/vicuna-response-length-percepti…
-
**Is your feature request related to a problem? Please describe.**
Calculating properties and embeddings can be made faster by GPU, we should make sure we have gpu tests for both
**Describe the so…
-
💿 Checking dataset...
📁MyDrive/Loras/XiaoBu/dataset
📈 Found 30 images with 2 repeats, equaling 60 steps.
📉 Divide 60 steps by 4 batch size to get 15.0 steps per epoch.
🔮 There will be 10 epochs, f…