-
### System Info
When initializing LlamaTokenizer from the Transformers library, the tokenizer is being recognized as a bool. This issue persists across different environments and Python versions.
…
-
I am follwing the [instructions in the Llama2 README](https://github.com/pytorch/executorch/blob/d9aeca556566104c2594ec482a673b9ec5b11390/examples/models/llama2/README.md#instructions) to test llama m…
-
### Your current environment
- vLLM version: v0.5.3.post1 (Public Docker Image )
- Model: Llama 3 70 B
- Dtype: FP16
- GPU: Nvidia H100
### 🐛 Describe the bug
The vLLM metrics endpoint is showin…
-
When I quantified the Qwen2.5-1.5B-instruct model according to **"Quantizing the GGUF with AWQ Scale"** of [docs](https://qwen.readthedocs.io/en/latest/quantization/llama.cpp.html) , it showed that th…
-
### Jan version
0.5.10
### Describe the Bug
https://discord.com/channels/1107178041848909847/1313496019475894363
When using the Retrieval (RAG) feature with PDF files, the engine fails to …
-
### Name and Version
```
.\llama-cli.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA…
-
Any configs in which `checkpoint_files` is a list of files > 4, use FormattedFiles utility to shrink the size of the file.
Example from [llama3/70B_lora](https://github.com/pytorch/torchtune/blob/…
-
I got the following error when running model Imported from GGUF which is generated from the model fine-tuned with LoRA.
Error: llama runner process has terminated: GGML_ASSERT(src1t == GGML_TYPE_F…
-
### Contact Details
ksilverstein@mozilla.com
### What happened?
Summary: Using the llamafiler `/tokenize` endpoint does not seem to add special tokens when the corresponding flag is set to true, at…
-
Thanks for adding support to VLM.
I was using [this](https://github.com/stanfordnlp/dspy/blob/main/examples/vlm/mmmu.ipynb) notebook.Tried with the `Qwen2-VL-7B-Instruct` and `Llama-3.2-11B-Vision-…