-
Hello! I was trying to use GPU offload on M1 Max with 32Gb of Ram to see if it will speed up things or not. Replies generating indeed faster (I think about 3 times as faster), but they are nonsensical…
-
I was working off [this example](https://huggingface.co/blog/llama2#fine-tuning-with-peft) and found that the code there was not working with an environment i already have established.
Specifically…
-
I'm frequently getting this while attempting to use the Linux version of Koboldcpp, recently cloned as of an hour or so ago. I can successfully connect to the endpoint and enter a prompt. I can also w…
-
运行generate.sh脚本后,web服务成功启动了,但是点击submit报错:
```
Something went wrong
Expecting value: line 1 column 1 (char 0)
```
脚本内容如下:
```bash
BASE_MODEL="decapoda-research/llama-7b-hf"
LORA_PATH="Chinese…
-
I own a Macbook Pro M2 with 32GB memory and try to do inference with a 33B model.
Without Metal (or `-ngl 1` flag) this works fine and 13B models also work fine both with or without METAL.
There is…
-
LLAMA_METAL=1 make -j && ./main -m ./models/guanaco-7B.ggmlv3.q4_0.bin -p "I love fish" --ignore-eos -n 1024 -ngl 1
llama_print_timings: load time = 7918.69 ms
llama_print_timings: sa…
-
With the `SFTTrainer` it's unclear to me how to instruction tune. I might be missing relevant details - but I the examples I've seen look like they are fine-tuning on the prompt and response rather th…
-
### Describe the bug
I didn't like one of the answers and asked for a new one, so I pressed the Regenerate button. I got the same text back. I set a seed value or Generation preset and the same text …
-
### Describe the bug
I've fine-tuned the "decapoda-research/llama-7b-hf" model on a cloud GPU and got my adapter_model.bin and adapter_config.json files. What I want now is to run the quantized 4-bit…
-
Hey, first of all, amazing work. Thanks for building an open llama model. As the title suggests, I would like to know if this model could be compatible with llamacpp
Thanks