-
# Quantified with the Yolov5 model, the MAP@0.5 is high(around 0.47), but the detection results are outrageous and unexpected
These days I have tried to do some quantification with yolov5_nano by …
-
## 🐛 Bug
torch.mv with sparse matrix gives internal assert on cuda, but works on cpu
## To Reproduce
device = "cuda"
vector = torch.tensor([0.1, 0.2], device=device)
indexes = torch.tensor(…
-
### 🐛 Describe the bug
When attempting to use `CaptumExplainer` to explain a graph-level prediction from a Heterogeneous GNN, I get the following traceback:
```
explanation = explainer(
Fil…
-
### Describe the bug
Training a model with an LSTM module causes a segmentation fault after loss.backward() has been called a random number of times. The number of times the training loop can run …
-
## Bug Description
The input and src tensor can be on cpu but the index tensor is being casted to gpu:0 and this results in src and index tensor are on different devices issue. Cast it to the same…
-
### Discussed in https://github.com/openvinotoolkit/openvino_notebooks/discussions/2479
Originally posted by **matrix1233** October 28, 2024
Hello,
I followed the exact solution provided in…
-
大佬,之前看到这个项目一直在测试模型,我这边目前使用的是llamacpp,因为有公司这边的服务器,所以是用cpu的,速度10t/s。公司内部用足够了。
知识库用到的模型:
Qwen2.5-14B:q4
bge-reranker-base
Dmeta-embedding-zh-small
大佬这是我这边使用的模型,不知道能不能改成使用cpu的,非常感谢。
# 后台启动 noh…
-
### 🐛 Describe the bug
I noticed a qualitative regression in the inference quality when compiling a segformer-b0 model from the `transformers` library. This regression was introduced in the 2.5.0 rel…
-
### 🐛 Describe the bug
Async NCCL comminucations from `torch.distributed` should run in parallel with CUDA computing kernels, but traces from `torch.profiler` shows it is not true for the first run. …
-
**Is your feature request related to a problem? Please describe.**
For now the Tensorflow and ONNX backends in Triton support thread controls ([here](https://github.com/triton-inference-server/tens…