-
It'd be nice to be able to use this tool w/ a offline, locally running, open source, llm instead of using the openAI API's. You could probably use the https://github.com/rustformers/llm crate to achie…
-
Execute: cargo run --example llama
Have error:
Running on CPU, to run on GPU, build this example with `--features cuda`
loading the model weights from meta-llama/Llama-2-7b-hf
Error: request …
-
It'd be good to be able to bounce ideas off each other in real-time instead of through issues for more moment-to-moment discussion. The popular choices in the Rust world are Discord and Zulip, from wh…
-
I started a new model server using llama_cpp_server with the following command
python3 -m llama_cpp.server --model ~/dev/models/codellama-13b.Q5_K_M.gguf --n_gpu_layers 35 --n_batch 12000
This sta…
-
**Describe the bug**
Out of memory when deploying TabbyML/CodeLlama-7B in Modal with default Modal app.py script
**Information about your version**
```
IMAGE_NAME = "tabbyml/tabby:0.5.4"
MODEL_…
-
Command:
lm_eval --model vllm \
--model_args pretrained=${MODELDIR},tokenizer_mode="slow",tensor_parallel_size=$NUM_GPU,dtype=auto,gpu_memory_utilization=0.8 \
--tasks arc_challenge \
…
-
### System Info
## Running on CPU
### CPU Details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Ord…
-
### System Info
I deploy meta-llama/Llama-2-13b-chat-hf with tgi and get almost ~80GB allocated, reported by nvidia-smi:
```
+-----------------------------------------------------------------------…
-
### 🐛 Describe the bug
在application下面的colossal-llama2,运行train.sh的时候,有如下报错信息:
```
bash train.sh
/data_lc/envs/coloai/lib/python3.10/site-packages/colossalai/initialize.py:48: UserWarning: `config…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### Reproduction
I have started with QLoRA DPO-datasets and would like to continue on PT-datasets.
What parameters…