-
### Bug Description
```
(poetry-test-py3.12) ➜ poetry-test poetry lock
Updating dependencies
Resolving dependencies... (0.6s)
Because llama-index (0.10.50) depends on llama-index-core (0.10.…
-
### Your current environment
driver 1.17
vllm 0.5.3.post1+gaudi117
```text
export VLLM_GRAPH_RESERVED_MEM=0.1
export VLLM_GRAPH_PROMPT_RATIO=0.9
export VLLM_PROMPT_S…
-
### Feature request
- Support loading from sharded GGUF model files (For example, [legraphista/Meta-Llama-3.1-70B-Instruct-IMat-GGUF](https://huggingface.co/legraphista/Meta-Llama-3.1-70B-Instruct-…
-
Log output:
```
(llm-cpp) D:\Users\Documents\Projects\llama-cpp>server.exe -m "Qwen1.5-MoE-A2.7B-Chat.Q4_K_M.gguf" -ngl 999
{"tid":"12472","timestamp":1719324322,"level":"INFO","function":"main",…
-
I tested llama3 continue training with multi-machine tp4 pp2 dp2. If I enabled grad accum operation, the training would hang. The experimental environment is: 16H800 torch 2.1.2+cu121.
checkpoints:…
-
llama offical website : https://llama.meta.com/docs/llama-everywhere/running-meta-llama-on-mac/
## Describe the bug
curl http://localhost:11434/api/chat -d '{
"model": "llama3",
"messages": …
-
Hello NDIF Team,
I was following the tutorial https://nnsight.net/notebooks/tutorials/walkthrough/#2-Bigger to learn nnsight with remote execution. After registering the API key and trying to repro…
-
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))== Unsloth: Fast Llama patching release 2024.6
\\ /| GPU: NVIDIA A100 80GB PCIe MIG 7g.80gb. Max memory: 7…
-
### What is the issue?
My setup is a 4x A100 80GB, 2TB ram, dual intel cpu. Ubuntu server 22.04.
On a previous version of ollama, the model llama3.1:405b was loaded in a reasonable amount of second…
-
In the tutorial [Running LLaMA 3 8B with TensorRT-LLM on Serverless GPUs](https://www.cerebrium.ai/blog/running-llama-3-8b-with-tensorrt-llm-on-serverless-gpus), you mentioned a GitHub link for the tr…