-
Does anyone know how to use llama3.1 or 3 in this addon.
I've tried downloading "Meta-Llama-3.1-8B-Instruct-Q2_K.gguf" from https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF/tree/main…
-
This occurs when using two GPUs, but it does not occur when I use just the one.
I made sure to update to the docker image used in the dockerfile.
commit: a702c6dd2944aaf75800b11f4dfeec6fe5a9b068…
-
I was running the example script: `examples/scripts/train_ppo_llama.sh`.
Basically, it's ppo on llama3-8b with 8*H100, flash_attn, zero3, gradient_checkpointing, adam_offload, but it's OOM after some…
-
Hi, I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1. Is it possible to know why?
I have built an 8bit quantised llama3 engine a…
-
C:\Users\razvan\Downloads\mindcraft-main\mindcraft-main>node main.js
file:///C:/Users/razvan/Downloads/mindcraft-main/mindcraft-main/settings.js:8
"profiles": [
^^^^^^^^^^
SyntaxError: U…
-
-
**Describe the bug**
I am trying to use Meta 1 and 2 which require inference support.
I am getting this error: `Unsupported model us.meta.llama3-1-70b-instruct-v1:0, please use models API to get…
-
I am trying to send a rather long prompt (36k tokens) to VLLM supported models, in particular llama3_8B_Instruct. However I am getting the error below:
scheduler.py:648] Input prompt (36893 tokens)…
-
**Description**
I have noticed that there was a huge difference in memory usage for runtime buffers and decoder for llama3 and llama3.1.
**Triton Information**
What version of Triton are you usin…
-
**Is your feature request related to a problem? Please describe.**
Llama3.2 was released, and as it has multimodal support would be great to have it in LocalAI
**Describe the solution you'd li…