-
This issue is to track any issues found with GGMLv3 and latest llama.cpp python bindings.
These changes are ONLY available using the `main` tag:
`ghcr.io/nsarrazin/serge:main`
-
Hi all
Hugging Face has a max file size limit of 50GB, which is a bit annoying. This means it's not possible to upload a q8_0 GGML of a 65B model, or a float16 GGML for a 30B model.
I've had tw…
-
### System Info
```
(data_quality) brando9~ $ python collect_env.py
Collecting environment information...
PyTorch version: 2.0.1
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used…
-
++ echo 'START TIME: Mon Jun 19 19:22:10 CST 2023'
START TIME: Mon Jun 19 19:22:10 CST 2023
++ ROOT_DIR_BASE=/Anima/saved_models/qlora_cn
++ OUTPUT_PATH=/Anima/saved_models/qlora_cn/output_16871737…
-
### Describe the bug
When using a small 3-8 bit model that could fit into a single GPU (3090) anyway, there are no issues and Windows Task Manager also shows that both models receive "something" in t…
-
Hello, I'm trying to run the following [HuggingFace notebook](https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing#scrollTo=ym9bmcpKP9XT
).
The code runs fine on Col…
-
I own a Macbook Pro M2 with 32GB memory and try to do inference with a 33B model.
Without Metal (or `-ngl 1` flag) this works fine and 13B models also work fine both with or without METAL.
There is…
-
Apparently the --prelude-prompt-file option does not work. It is as if the text is given to the model without any formatting, regardless of what is written in the file.
Although I may be doing somet…
-
According to https://huggingface.co/blog/4bit-transformers-bitsandbytes
I am guessing that GPT4 had eval the 65B fine tuned model to have 99% of ChatGPT?
And would you release the delta weight f…
-
```shell
(xtuner) ➜ xtuner git:(main) ✗ xtuner train internlm_7b_qlora_oasst1_512_e3
08/31 10:34:21 - mmengine - INFO -
------------------------------------------------------------
System enviro…