-
While serving the CodeLLaMA 13B (`CodeLlama-13b-hf`) base model with `v1/completions` API with 1 A100, I encountered the following CUDA memory issue.
The same thing happened with the 34B base model, …
-
This URL is broken: https://h1alexbel.github.io/fakehub/fakehub-vitals.html
-
## Problem
The function [\_\_getitem__](https://github.com/Lightning-AI/litgpt/blob/main/litgpt/data/base.py#L77) in the `SFTDataset` computes the `input_ids` by doing `encode(prompt + response)`, bu…
-
TEMPLATE """
{{- if .First}}
### System Prompt
{{ .System}}
{{- end}}
### User Message
{{ .Prompt}}
### Assistant
"""
Is supposedly the template (ollama syntax)
-
-
### Describe the issue
Exported a Codellama (**codellama/CodeLlama-7b-hf**) model.
I tried to optimize the float model using the ORT optimizer (https://github.com/microsoft/onnxruntime/blob/v1.17.…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### Reproduction
`RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at som…
-
I hope to see a tinyllama example included with a custom conversational dataset, particularly with Chat ML. Or how do I achieve this with the provided tinyllama "default" example? My goal is to demons…
-
torchrun --nproc_per_node 1 llamacpp_mock_api.py \
--ckpt_dir CodeLlama-7b-Instruct/ \
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
--max_seq_len 128 --max_batch_size 4
…
HwJhx updated
2 months ago
-
### Describe the issue
There is my code:
```
import autogen
config_list_codellama = [
{
'base_url': "http://localhost:1234/v1",
'api_key': 'NULL',
}
]
llm_con…