-
### #
- [X] I have searched the existing issues
### Current behavior
Trying to load even the TinyLLaMa Chat 1.1B model doesn't work, Cortex seems to crash immediately after loading the model. This …
-
Hi,
I saw [this example](https://twitter.com/awnihannun/status/1736785120085024821) and was wondering if it might be possible to train TinyLlama with LoRA. I haven't been able to figure out how to co…
-
### Description of the bug:
- using generative/example/tiny_llama/convert_to_tflite.py to transfer model to `*.tflite, (no quantize)`
- using text_generator_main.cc to load `tiny_llama_seq512_kv102…
-
How can I use the generated embeddings with the generateCompletion() function?
I tried setting it as an option
```
$embeddings = $ollamaClient->generateEmbeddings($documents, 'nomic-embed-text');
…
-
### System Info
Docker image: ghcr.io/huggingface/text-generation-inference:2.2.0-rocm
Hardware: AMD MI250
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [x] An officially suppo…
-
When running a dataset.map with `num_proc=16`, I am unable to tokenize a ~45GB dataset on a machine with >200GB Vram. The dataset consists of ~30000 rows with a string of 120-180k characters.
The m…
-
https://github.com/jzhang38/TinyLlama/blob/0fcf9b61130f189b78747b0b013262c72f01286a/pretrain/tinyllama.py#L199C8-L208
-
### Your current environment
```text
vLLM version 0.5.0.post1
```
### 🐛 Describe the bug
Hi,
Seems that there is a dirty cache issue with `--enable-prefix-caching`. We noticed it …
-
Have you tried experimenting with lower parameter models like flan t5, albert, bert etc or even qwen 0.5b?
With fine tuning they might be able suffice in this specific domain?
I have a low end machi…
-
I have cloned the [tinyllamas](https://huggingface.co/karpathy/tinyllamas) repo and trying stories260 model. It fails with the malloc failed issue
```
./run tinyllamas/stories260K/stories260K.bin …