-
As a Windows user, I tried to compile this and found the problem was on these two files "```flash_fwd_launch_template.h```" and "```flash_bwd_launch_template.h```". below "```./flash-attention/csrc/fl…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
Hi,
The code sample below - which is based on an example in Matthew Honnibal's blog "Against LLM maximalism" (https://explosion.ai/blog/against-llm-maximalism) - fails to produce any output. This i…
-
I am facing difficulties in specifying GPU usage for different models for LLM inference pipeline using vLLM. Specifically, I have 4 RTX 4090 GPUs available, and I aim to run a LLM with a size of 42GB …
-
Using bigdl-llm in a production environment, Python performance is too poor, can you provide an inference library in C++ and provide an OpenAI-compatible API
-
### What is the issue?
It was working fine with 2x 7900 XTX but after I added a new graphic card the output it just like this
![imagen](https://github.com/ollama/ollama/assets/118543481/cb8024…
-
### Describe your problem
I select the corresponding knowledge base on the web page, upload multiple PDF files, start one or more file parsing, often appear stuck in the parsing (one or two days do n…
-
Hey there,
I'm running
**LocalAI version:**
`docker run --rm -ti --gpus all -p 8080:8080 -e DEBUG=true -v $PWD/models:/models --name local-ai localai/localai:latest-aio-gpu-nvidia-cuda-12 -…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
nih23 updated
2 months ago
-
### Feature request
Export to onnx fails for opset 9 with T5
### Motivation
ONNX opset 9 is required by SNPE, Qualcomm SDK accelerator. By supporting ONNX opset 9, we will unleash ML on the e…