-
When I run convert.py my CPU usage is over 70% and GPU is only 1%. Is that normal?
Is there a way to make my GPU do the work?
-
**LocalAI version:**
v1.21.0
**Environment, CPU architecture, OS, and Version:**
X86 GPU DOCKER
**Describe the bug**
GPU offloading, but not chat response
**To Reproduce**
chat with mod…
-
In some issues we discussed offloading functionality to the DNN HTML editor provider.
Examples: #158 #168 #162 #38
I spoke to @skamphuis and he mentioned something I think @mitchelsellers refere…
-
| | |
|--------------------|----|
| Bugzilla Link | [PR49533](https://bugs.llvm.org/show_bug.cgi?id=49533) |
| Status | NEW |
| Importance | P normal |
|…
-
I followed the readme, but i can't get llama-cpp to run on my 4090.
```
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-di…
-
During optimisation (including profiling runs), I ended up using three or four different versions of a transformation script (`original and with tiling` and `profile and non_profile`). This makes it …
hiker updated
2 weeks ago
-
### Description
We have encountered an issue where editing offloaded images directly from the WordPress media library (e.g., rotating or cropping) causes the images to appear broken in the media li…
-
Any reasons why mistralai_mistral-7b-instruct-v0.2 does not offload on gpu ?
load INSTRUCTOR_Transformer
max_seq_length 512
Starting get_model: llama
Failed to listen to n_gpus: No modu…
-
**_Reported by Paul Sokolovsky:_**
Users of devices which provide socket and TCP/IP offload engines would benefit in memory and power efficiency by enabling full offload of the Zephyr BSD socket APIs…
-
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, B…