-
使用 13B 模型,並用以下指令:
```
CUDA_VISIBLE_DEVICES=1 python generate.py --model_path "decapoda-research/llama-13b-hf" --lora_path "Chinese-Vicuna/Chinese-Vicuna-lora-13b-belle-and-guanaco" --use_local 1
…
-
I am trying to understand the best way to setup prompts and the library for an interactive chat session. It looks like based on the InteractiveModeExecute.cs example, the "bob" personality is only de…
-
baichuan-7B is an open-source large-scale pre-trained model developed by Baichuan Intelligent Technology. Based on the Transformer architecture, it is a model with 7 billion parameters trained on appr…
-
大佬好,最近在构造6.5万首诗词问答,单指令语料,使用13B底模,在Chinese-Vicuna-lora-13b-belle-and-guanaco/checkpoint-3000 基础上,用other_continue方式训练。目前已经完成20个epochs左右,效果是通用能力已丢失,目前还未记忆到相关回答。如下所示:
![企业微信截图_1684371563507](https://gi…
-
`python generate.py`,输出如下:
```
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
Traceback (most recent call last):
File "/home/Bloom-Lora/processor/processing.py", line 108, in
instruction_dataset = instruction_dataset.map(group_text,
File "/usr/local/python3/lib/pyth…
-
### Describe the bug
No matter what model I load, it always produces an error (wizardLM-7B-GPTQ-4bit-128g, wizard-vicuna-7b-uncensored-gptq-4bit-128g no-act-order safetensors).
### Is there an exist…
-
I was trying to do an apples-to-apple shootout on GPTQ vs the new llama.cpp k-quants (memory usage, speed, etc) but ran into a bump with perplexity. It looks like exllama loads a jsonl formatted versi…
-
[[BELLE](https://github.com/LianjiaTech/BELLE)](https://huggingface.co/datasets/BelleGroup/multiturn_chat_0.8M)
这里公开了 8M的 80W条 多轮对话的语料
格式是这样:
```
instruction: 指令(这里是Human/Assistant,多轮对话上下文)
inp…