-
There are several experiments being done with this repo to understand and evaluate the effects of quantization on the `llama2.c` models.
It is a great test-bed to analyze the effects of varying app…
-
### Priority
Undecided
### OS type
Ubuntu
### Hardware type
Xeon-GNR
### Installation method
- [X] Pull docker images from hub.docker.com
- [ ] Build docker images from source
### Deploy metho…
-
## 🚀 Feature
Hello,
Is it possible to run the LLM model (Llama 2 7B Quantized) on the Qualcomm Hexagon NPU in Android OS ?
How to run the LLM model on the Qualcomm Hexagon NPU in Android OS ?
…
-
## Expected Behavior
Model being loaded
## Current Behavior
Model Not loading
## Steps to Reproduce
I don't know actually, I installed LOLLMs with the win installer bat file, and chose the op…
-
### Priority
Undecided
### OS type
Ubuntu
### Hardware type
Xeon-SPR
### Installation method
- [ ] Pull docker images from hub.docker.com
- [ ] Build docker images from source
### Deploy metho…
-
### Priority
P2-High
### OS type
Ubuntu
### Hardware type
Xeon-GNR
### Installation method
- [ ] Pull docker images from hub.docker.com
- [ ] Build docker images from source
### Deploy method
…
-
**Expected Outcomes**
- Prompt: Summarize the content from the url (do not emit the url back) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/managing_file_systems/ind…
-
【模仿】https://github.com/3dw/start-learn/issues/10 和 https://members-frontend.pages.dev/
【實驗】使用CloudFlare Vectorize來製作[創源工具資料]問答機器人
-
### Model description
I'm using [neural-chat-7b-v3-1](https://huggingface.co/intel/neural-chat-7b-v3-1) locally on my laptop and it would sure be sweet if I could serve it through tgi.
I can curre…
-
the TGI image with label "text-generation-inference:latest-intel-cpu" bring up failed with "Intel/neural-chat-7b-v3-3" after the image upgraded to the build of "Created": "2024-08-20T20:17:15.74262894…