-
### System Info
Hi everyone, when trying to update from Llama 3 8B Instruct to Llama 3.1 8B Instruct, I noticed a crash:
```bash
Args {
model_id: "meta-llama/Meta-Llama-3.1-8B-Instruct",
…
-
Hi:
we're trying to summarize smooth quant llama, but was reported:
```
Loading checkpoint shards: 100%|████████████████████████████████████████████████████| 3/3 [00:13=4.36
and torch>=2.1.1 t…
-
### Describe the issue as clearly as possible:
On certain prompts, the LLM can spiral into an infinite loop providing the same item repeatedly, until stopped by max_tokens parameter.
In that case, t…
ea167 updated
2 months ago
-
### What happened?
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
/owner/ninth/llama.cpp/ggml/src/ggml-cann.cpp:61: CANN error: E89999: In…
-
### Bug Description
I´m deploying a web app in Pythonanywhere, that a is hosting for python web apps in linux machines.
I´ve created a virtual env with virtualenvwrapper and set it up in the web a…
-
So far I've ported the following models to Java:
Llama 3 & 3.1, Mistral/Codestral/Mathstral/Nemostral (+ Tekken tokenizer), Qwen2, Phi3 and Gemma 1 & 2 ...
All models are bundled as a single ~2K li…
mukel updated
1 month ago
-
Hi, I had tried to test llama2 based on TensorRT-LLM.
my environments (based on "nvcr.io-nvidia-tritionserver-23.10-trtllm-python-py3"):
> cuda 12.2
> gpu A100 40G (1)
> python 3.10.12
> ubunt…
-
I experimented with LLaMA 2.
I want to **replicate multiple experiments**. To begin with, I ran the demo, and the following results were obtained.
My results:
![image](https://github.com/user-at…
-
**Describe the bug**
llama-server exited with status code -1
**Information about your version**
Unable to get version as it will not start. Docker image used:
```
REPOSITORY …
-
### System Info
```Shell
latest version. tested via both `pip install -U accelerate` and `pip install git+https://github.com/huggingface/accelerate`
```
### Information
- [ ] My own modif…