-
Issue from @linhtran174
Running neural-chat-7b consuming 100% CPU, model not started, been running for more than 2 mins now. GPU not utilized 😢.
Update: It works, just too slow. Token speed at 0.0…
-
## ❓ General Questions
Due to the #1755 , I ignored the specific chat template for Baichuan model.
However, when I added the template just like #1701 (To File `cpp/conv_template.cc` and `pyth…
-
When I try to use this demo:
`
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "/itrex/neural-chat-…
-
### System Info
LangChain 0.0.292
Python 3.11.5
GPT4ALL with LLAMA q4_0 3b model running on CPU
### Who can help?
@agola11
### Information
- [ ] The official example notebooks/scripts…
-
When I began to try and determine working models for this application (https://github.com/imartinez/privateGPT/issues/1205), I was not understanding the importance of prompt template:
Therefore I h…
-
UPDATE (08/09/2023):
We have done major performance overhaul in the past few months, and now I'm happy to share the latest results:
- SOTA performance on CUDA: https://github.com/mlc-ai/llm-perf-b…
-
The [Transformer Python API](https://github.com/intel/intel-extension-for-transformers/tree/main/intel_extension_for_transformers/llm/runtime/graph#How-to-use-Transformer-based-API) section is not wor…
-
I feel this is a major bug, as anyone using ollama for an extended time using several models will have the same issue.
I'm using https://github.com/iplayfast/OllamaPlayground/tree/main/createnotes#…
-
Here is my HF format of exllamav2 model
```python
import torch, os
from contextlib import contextmanager
from pathlib import Path
from typing import Optional, List, Union, Dict
from transforme…
-
### System Info
GPT4all version 2.5.2, windows 11, processor Ryzen 7 5800h 32gb RAM
### Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
### Reproduction
1) …