-
During model inference, model weight is frozen and won't change between iterations. CPU prefers special weight layout to accelerate the execution, then we need to prepack the model weight before model…
-
### Your current environment
```text
Collecting environment information...
WARNING 07-22 09:16:28 _custom_ops.py:14] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
### Your current environment
why is it important:
This is a prerequisite to the work on enabling troch.compile on vllm, we need to be able to build vllm with nightly so that we can iterate on chan…
-
I found the latest opensource LLM from google: Gemma has two version of model structure.
1. https://github.com/google/gemma_pytorch/blob/main/gemma/model_xla.py
2. https://github.com/google/gemma_…
-
### 🚀 The feature, motivation and pitch
[vLLM](https://github.com/vllm-project/vllm) is a high-throughput and memory-efficient inference and serving engine for LLMs. We would like to use `torch.compi…
-
```
D:\MiniCPM\venv\Lib\site-packages\torch\_tensor.py:962: UserWarning: The operator 'aten::pow.Scalar_out' is not currently supported on the ocl backend. Please open an issue at for requesting supp…
-
INFO 02-07 11:14:13 llm_engine.py:70] Initializing an LLM engine with config: model='/root/local_model_root/model/llama-2-7b/modelscope/Llama-2-7b-chat-ms', tokenizer='/root/local_model_root/model/lla…
-
**LocalAI version:**
`local-ai` 2.1.0 (TrueCharts chart Version: 6.6.1)
Cublas Cuda 11 + FFmpeg image
**Environment, CPU architecture, OS, and Version:**
uname -a
```
Linux truenas 6.1.63-prod…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.3 LTS (x86_64)
GCC ve…