-
### 🐛 Describe the bug
index_select() applied in sparse tensor can't backprop.
As the example below demostrates:
```
import torch
i = torch.tensor([[0, 1]])
v = torch.tensor([2., 2.], requir…
-
### What is the issue?
When I try `ollama run llama3.1:70b`, occur error `Error: llama runner process has terminated: error loading model: unable to allocate backend buffer`
```
C:\Users\sol>olla…
-
I installed graphvite in an ubuntu18-cudnn7-cuda10.1-python3.7 version docker.
conda version is `conda 4.8.3`
`conda list graphvite` got the following result:
> \# packages in environment at …
-
### 🐛 Describe the bug
In the following Python program, `x + 2` gets traced as `torch.ops.aten.add.Tensor` instead of `torch.ops.aten.add.Scalar` which would be more technically correct (and is the o…
-
When running rnn_product.py I get the following error, although it seems that my GPU has enough memory. Any idea?
-------------------
trainable parameter count:
79772859
2017-12-22 23:31:59.579386…
-
Since I run train.py, something wrong happened
loss.backward(retain_graph=True)
File "/home/zhang/miniconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.…
Nicny updated
4 years ago
-
I stumbled upon this issue when trying to convert a custom trained Mask R-CNN model with attached Keypoint head using the R50-DC5 backbone to onnx format.
At first I thought my model is the issue but…
-
Hi,
I use ArchLinux with dual GPUs and connected with NVLink. I install the `cuda` and `nccl` from the community repo.
```
cuda 11.8.0-1
nccl 2.15.5-1
```
I use the following command
`
CUDA_…
-
### 🐛 Describe the bug
Hello, when I am using DDP to train a model, I found that using multi-task loss and gradient checkpointing at the same time can lead to gradient synchronization failure betwe…
-
Hi author,
good job.when i ran the train.py,i meet this error.
File "D:\mycodeapp\anaconda\envs\py7\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl
result = self.forwa…