-
i'm not well versed with python and where do i put the downloaded llama-2-7b-chat.Q4_0.gguf file?
i can make llama.cpp work real easy on my laptop but i cant seem to get this to work
i did git c…
-
Hi LP!
I stumbled on your repo here since we're building our own alternative to backprop at aolabs.ai, using weightless neural networks.
Unless I'm missing something, MNIST has only 70k samples …
-
The Intel GPU Flex 140 has two GPUs per card, with a memory capacity of 12 GB (6GB per GPU). Currently, I can do the inference only on one GPU device with limited memory. Could you please guide to run…
-
**backend w/ rag**
I already installed requirement
```
pip install -r ~/intel-extension-for-transformers/intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/requirements.txt
``…
-
hello, I am trying to replicate GraphRAG Demo on Intel Arc GPU 770, But getting below issue :
I am facing issue wit mistral :
```
12:33:38,271 httpx INFO HTTP Request: POST http://localhost:11434…
-
this is using the example code only
```
from transformers import AutoTokenizer, TextStreamer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM
model_name = "Intel/neur…
-
Loading saved model runs into following error
It also takes a very long time to run and save quantized models.
```
2024-03-21 08:48:58 [INFO] loading weights file models/4_bit_llama2-rtn/model.sa…
-
Hi, I am trying to run vllm-serving for the neural-chat model using https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/vLLM-Serving . However facing this issue
![image](htt…
-
from intel_extension_for_transformers.neural_chat import PipelineConfig
from intel_extension_for_transformers.neural_chat import build_chatbot
from intel_extension_for_transformers.neural_chat impor…
-
Hi all,
I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llam…