-
## 🐛 Bug
```
python3 -m mlc_llm.build --hf-path TinyLlama/TinyLlama-1.1B-Chat-v0.6 --target iphone --quantization q4f16_1 --use-cache 0 --use-safetensors ─…
-
I think it would be good to see how the performance of TinyLlama & TinyLlama-chat evolve over the checkpoint.
We can have this through the HL leaderboard but it is quite long.
What would you sugge…
-
Were there any trade-offs or considerations you made when deciding on the model's size? Or What criteria did you use to select the specific number of layers, attention heads and Embedding Size etc. in…
-
### Describe the issue as clearly as possible:
When I try the examples on the github front page, some do not work from a fresh conda environment.
### Steps/code to reproduce the bug:
```python
…
-
Was trying to use this but running script.sh didnt work after downloading requirements in a venv. Also tried running chat_gradio but gradio package was not even in requirements. Worked after i pip ins…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
Amazing work! I really like the project!
If I understand correctly, TinyLlama/TinyLlama-1.1B-Chat-v0.6 is fine-tuned following the Zephyr recipes from HF4.
I assume you did a full training and not…
-
As of at least the last two recent versions, I have been experiencing a lot of issues with Ollama. Primarily, it seems to report that it can't connect to the server when using the Ollama CLI commands…
-
# Expected Behavior
I expected finetune to produce a usable LoRA adapter for all supported models.
# Current Behavior
For Mistral models (I tried both Mistral and Zephyr, Q8_0, Q5_K_M, Q5_0) …
maxxk updated
5 months ago
-
After fine-tuning the model, I obtained a 2.2 GB PyTorch model.bin file. Is it possible to reduce this model size to 550 MB, and if so, how and when can we achieve this?