-
### Priority
P1-Stopper
### OS type
Ubuntu
### Hardware type
Xeon-other (Please let us know in description)
### Installation method
- [ ] Pull docker images from hub.docker.com
- [X] Build dock…
-
### System Info
Hi there, I met a bug that when using TGI Gaudi 2.0.5 with both meta-llama/Meta-Llama-3-8B-Instruct and Intel/neural-chat-7b-v3-3. When I set the default frequency/repetition/presen…
-
In the inference script, the default weights of RS-LLaVA are as follows:
model_path = 'BigData-KSU/RS-llava-v1.5-7b-LoRA'
model_base = 'Intel/neural-chat-7b-v3-3'
However, these two models do not…
-
Today, the Workers AI types rely heavily on function overloads to specify arguments for different models. This unfortunately results in very difficult to debug types, and poor DX.
As an example wit…
-
### Summary
- Provide k-quant models
- Maintain existing gguf models
- Embedding models
- [x] [second-state/Nomic-embed-text-v1.5-Embedding-GGUF](https://huggingface.co/second-state/Nomic-…
-
The Intel GPU Flex 140 has two GPUs per card, with a memory capacity of 12 GB (6GB per GPU). Currently, I can do the inference only on one GPU device with limited memory. Could you please guide to run…
-
i'm not well versed with python and where do i put the downloaded llama-2-7b-chat.Q4_0.gguf file?
i can make llama.cpp work real easy on my laptop but i cant seem to get this to work
i did git c…
-
Loading saved model runs into following error
It also takes a very long time to run and save quantized models.
```
2024-03-21 08:48:58 [INFO] loading weights file models/4_bit_llama2-rtn/model.sa…
-
使用更改GitHub代码的方式更改模型,代码中更改了Llama 3.1,写法符合cloudflare模型和项目模型的格式。在重新部署后无法使用新模型
-
this is using the example code only
```
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neur…