-
Hi,
I followed the instructions [here ](https://github.com/nod-ai/SHARK-Turbine/tree/main/models/turbine_models/custom_models) to compile llama model into .vmfb.
I specified the quantization to 4bit…
-
This snippet will cause memory usage to rise indefinitely:
```python
from transformers import AutoTokenizer
import gc
tokenizer = AutoTokenizer.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v…
-
Maybe some of TinyLlama, Phi, Qwen2 small models?
-
[Jlama](https://github.com/tjake/Jlama) is a fast modern Java library for running many LLMs.
Jlama is built on Java 21 and utilizes the [Panama Vector API](https://openjdk.org/jeps/448) for fast infe…
-
https://lightning.ai/khaliq88/vision-model/studios/prepare-the-tinyllama-1t-token-dataset/terminal?fullScreen=true
-
### Description of the bug:
hi i have transfer tinyllama to tflite format, but when i use `https://netron.app/`, it can show to customer op, can i know the usage in tensorflow.
when don't expand it…
-
## Describe the bug
On the README page is an example to run community_tasks. It is:
```bash
lighteval accelerate \
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
--use_chat_…
-
I was just wondering but would the methods used in OnnxStream further benefit a tiny language model like [TinyLlama?](https://github.com/jzhang38/TinyLlama). Just wanted to know how far resource usage…
stl3 updated
11 months ago
-
**Describe the bug**
When loading TinyLlama or Llama-3-8B with dtype=int4, the model structure looks:
```
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(128256, 4096)
…
-
Hi Jiawei,
I was trying Galore on TinyLlama-1B using the codebase https://github.com/jzhang38/TinyLlama on 4* A800-80GB. I encounter the following error:
```
[rank1]: optimizer.step()
…