-
### What is the issue?
I have been trying to import HugginFace safetensors models but getting the following error when trying to use the model with `run`. This happens both with and without quantiz…
-
When run `optimum-cli export openvino --trust-remote-code --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 TinyLlama-1.1B-Chat-v1.0` locally, it reports like:
```bash
Traceback (most recent call last):
…
-
This snippet will cause memory usage to rise indefinitely:
```python
from transformers import AutoTokenizer
import gc
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v…
-
Hi,
i read your interesting paper about the dual spaced KD and started already to try out your code. I was able to get things running by downloading all components but I am not sure about some points…
-
Hi Jiawei,
I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error:
```
[rank1]: optimizer.step()
…
-
How can I use the generated embeddings with the generateCompletion() function?
I tried setting it as an option
```
$embeddings = $ollamaClient->generateEmbeddings($documents, 'nomic-embed-text');
…
-
Have you tried experimenting with lower parameter models like flan t5, albert, bert etc or even qwen 0.5b?
With fine tuning they might be able suffice in this specific domain?
I have a low end machi…
-
### #
- [X] I have searched the existing issues
### Current behavior
Trying to load even the TinyLLaMa Chat 1.1B model doesn't work, Cortex seems to crash immediately after loading the model. This …
mshpp updated
2 weeks ago
-
### Your current environment
```text
vLLM version 0.5.0.post1
```
### 🐛 Describe the bug
Hi,
Seems that there is a dirty cache issue with `--enable-prefix-caching`. We noticed it …
-
Fine-tuning:
- 4 models are working on Ollama (3 tinyLlama verisons with 1, 10, 50 epoch)
- I was able to train an Llama2 model (1 epoch only)
- Llama.cpp depricated some functionality which made …