-
When running an FP16 GGUF model fully offloaded to GPU with Vulkan backend, the performance is much worse than running on an AVX2 CPU. Quantized models, however, perform much faster when offloaded to …
-
### Your current environment
Collecting environment information...
INFO 08-28 14:32:56 importing.py:10] Triton not installed; certain GPU-related functions will not be available.
WARNING 08-28 14:3…
-
**Problem Link:**
Check out the issue on GitHub: [Issue #2432](https://github.com/janhq/jan/issues/2432).
**Why This Matters:**
Jan is designed to work best with newer technology, using something…
-
This is a suggestion for making the documentation and user experience to make it more finetuning-script agnostic.
So, currently,
- `finetune/lora.py` writes a `.../lit_model_lora_finetuned.pth`…
rasbt updated
8 months ago
-
**Describe the bug**
```
python -m examples.serving.causal-lm.llama-2-chat --pretrained_model_name_or_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0" --max_sequence_length=1024 --max_new_tokens=256 …
-
I am able to run this code no problem in the miniconda venv that I installed just for MLX solely:
from mlx_lm import load, generate
model, tokenizer = load("/Users/joy/mlx_model/solar_q8")
But …
-
Evaluation of the LM that showed some logging information to provide a document for the task manager is no longer available.
```bash
# Execution command
NUMEXPR_MAX_THREADS=72 lm_eval --model hf …
-
### 🔧 Proposed code refactoring
Instead of pushing our custom `h2oai_pipeline.py` to HF, we should use new chat template features.
See example: https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a…
-
Hi
Could you please add some codes about loading pretained models from huggingface?
I downloaded a light model with .bin format but it didn't work.
My model:
https://huggingface.co/karpathy/tinyll…