-
I am running llama_cpp version 0.2.68 on Ubuntu 22.04LTS under conda environment. Attached are two Jupyter notebooks with ONLY one line changed (use CPU vs GPU). As you can see for exact same environ…
-
nproc_per_node=4
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=$nproc_per_node \
swift sft \
--model_id_or_path "AI-ModelScope/llava-v1.6-mistral-7b" \
--template_type "llava-mistral-inst…
-
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("met…
-
### System Info
```
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4…
-
### The Feature
Pls add the method & proxy for NVIDIA API, which had example code:
```
from openai import OpenAI
client = OpenAI(
base_url = "https://integrate.api.nvidia.com/v1",
api_ke…
-
## Describe the bug
If they number of device layers exceed the models, then the host layers to assign seems to wrap/overflow instead of the expected `0`.
**NOTE:** With `llama-cpp` you can confi…
-
https://github.com/huggingface/transformers/blob/965cf677695dd363285831afca8cf479cf0c600c/src/transformers/models/mistral/modeling_mistral.py#L120-L121
https://github.com/huggingface/transformers/blo…
-
### Duplicates
- [X] I have searched the existing issues
### Summary 💡
Currently the AutoGPT app assumes the underlying LLM supports OpenAI-style function calling. Even though there is a config var…
-
**Describe the bug**
This is not a bug, but there are no other headings (e.g. usage) to select for this issue.
The response time is long, there are any settings that can make it respond faster.
*…
-
With no change, I run out of memory (A100 w/ 24GB). Setting it to anything other than the default causes the following error:
```
Exception in ModelRpcClient:
Traceback (most recent call last):
…