-
Follow the guide here: https://github.com/intel/ai-reference-models/tree/main/models_v2/pytorch/llama/training/cpu, faced several issues:
1. https://github.com/intel/ai-reference-models/blob/main/m…
-
### Proposal to improve performance
The spec dec performance of Eagleis worse than expected as shown below:
Model: [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llam…
-
### Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue](ht…
-
### Describe the bug
For whatever reason, this method runs SO SLOW in WSL2. I followed all the instructions to run a Codestral model, but it just runs at the speed of smell. It seems to be using my C…
-
Along the development of small language models, compressed language models play crucial roles as well.
Typical representatives (in time order) would be:
1) sheared-llama (https://arxiv.org/abs/231…
-
### What happened?
I notice https://github.com/ggerganov/llama.cpp/issues/5237 is the prior issue for the same bug. However, it was closed neither confirmed nor fixed.
Run with validation layer en…
-
```
import os
import yaml
from loguru import logger
from langchain_core.language_models.chat_models import BaseChatModel
from langchain_community.chat_models import ChatLiteLLMRouter
import li…
-
I first run the "python -m omni_speech.serve.model_worker --host myIP --controller http://myIP:10000 --port 40000 --worker http://myIP:40000 --model-path Llama-3.1-8B-Omni --model-name Llama-3.1-8B-Om…
-
### Before submitting your bug report
- [X] I believe this is a bug. I'll try to join the [Continue Discord](https://discord.gg/NWtdYexhMs) for questions
- [X] I'm not able to find an [open issue]…
-
The addon is not compatible with new AI models such as gemini flash and llama-3.2 vision 11B parameter models