-
### What happened?
LLaMA 3 has been trained 8192 context. When using a single slot with the llama.cpp HTTP server this slot is assigned the full 8192 context. However, when using multiple slots and n…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
- `llamafactory` version: 0.9.1.dev0
- Platform: Linux-6.11.2-arch1-1-x86_64-with-glibc2.40
- Pyt…
-
I would like to use this library for in-browser web ml inference because with the upcoming CPU support it is better than
1. ggml.cpp(llama.cpp/whisper.cpp) - as it supports both CPU and GPU and can u…
-
I tried to run h2ogpt with this command :
`python generate.py --base_model=meta-llama/Meta-Llama-3.1-8B-Instruct --use_auth_token=...`
and it triggered errors
```The attention mask and the p…
-
Hi,
I have a question regarding the huggingface model weights.
I was trying to load some your adapters and play with them but I found that the adapters were very large (~4GB) as in the screenshot be…
-
The official ollama supports this model in v0.3.4
[https://github.com/ollama/ollama/releases/tag/v0.3.4](https://github.com/ollama/ollama/releases/tag/v0.3.4)
Tried with ollama in 2.1.0b20240820, …
-
Hi,
I am currently attempting to reproduce the experiments detailed in the section titled "Process Rewards Annotating (Taking LogiQA-v2 as an Example)" from your README.md file. However, as I reach…
-
### System Info
```shell
I'm running inf2 neuron TGI on Sagemaker with optimum-neuron=0.0.25.
I'm using the SPECULATE=2 option but I get the following message in the logs:
Error: No such opt…
-
### Description of the bug:
Hi @pkgoogle ,
i used example c++ code to inference model i transfer, it can show some error.
- my command
```
bazel run -c opt //ai_edge_torch/generative/example…
-
## Describe the bug
Download https://github.com/EricLBuehler/mistral.rs/releases/download/v0.2.2/mistralrs-server-aarch64-apple-darwin.tar.xz
Use a tool like [asitop](https://github.com/tlkh/asito…