-
Hi! I noticed that when querying the same prompt twice with the HF APIs (guanaco-33b) it caches it and returns it immediately (virtual 400tps) whereas new requests happen with speed around 20 to 30 tp…
-
ERROR: Cannot install -r requirements.txt (line 30) and huggingface-hub==0.13.4 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested hugg…
-
[TheBloke](https://huggingface.co/TheBloke)'s (Tom Jobbins's) _Wizard-Vicuna-Uncensored_ models are performing very well for their size on the [Open LLM Leaderboard](https://huggingface.co/spaces/Hugg…
-
When attempting to split the model on multiple GPUs, I get the following error:
```
> python test_chatbot.py -d /home/john/Projects/Python/text-models/text-generation-webui/models/TheBloke_guanaco…
-
when running "guanaco_7B_demo_colab.ipynb"
i take load_in_4bit=True condition
but meet valueError of "Cannot merge LORA layers when the model is loaded in 8-bit mode".
-
[nano@archlinux Chinese-Vicuna]$ python interaction.py
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your …
-
Hi, I'm not so familiar with the training method using multi-GPUs.
I have a machine with 8 A100s, what should I do to full params SFT a llama2-7B model?
How to use the trl tool?
Thanks.
-
I tried to train the Falcon-7b model based on the tutorial from huggingface (https://colab.research.google.com/drive/1BiQiw31DT7-cDp1-0ySXvvhzqomTdI-o?usp=sharing) with my own dataset.
When i loaded …
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
Hello! I was trying to use GPU offload on M1 Max with 32Gb of Ram to see if it will speed up things or not. Replies generating indeed faster (I think about 3 times as faster), but they are nonsensical…