-
### 🐛 Describe the bug
!!! Exception during processing!!! free_upper_bound + pytorch_used_bytes[device]
-
## Description
When adding NDArray on different contexts, I get either:
- warning of different context: GPU 0 -> CPU
- error + crash: GPU 0 -> GPU N with N != 0
## Environment info (Required)
…
-
### 🐛 Describe the bug
```
import torch
model = torch.nn.EmbeddingBag(num_embeddings=49157, embedding_dim=32, mode="sum")
a = torch.tensor([[39906]]).long()
example_args = (a,)
model_eval = mo…
-
### 🐛 Describe the bug
I was recently testing bmm performance on H100. When batch size is 1, the number is normal and expected. However, when I increase batch size to 2 and 4, the TFLOPS number dro…
-
Has anyone encountered the following problem? I used SiD-LSG to distill an SDXL model (made some code adaptations to the text-encoder), and some color spots appeared on the face, which were very obvio…
-
Currently we draw the entire editing grid (waveform, grid lines, notes, markers) all at once. This will cause performance issues for very large (>> 10 min) maps.
A better solution would be to draw …
-
[//]: # "SPDX-FileCopyrightText: Copyright (c) 2022-2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved."
[//]: # "SPDX-License-Identifier: Apache-2.0"
[//]: # ""
[//]: # "Licensed under the …
-
### Proposal to improve performance
I have observed that TTFT increases linearly with a total number of batched tokens.
For example, given 100k batch
- TTFT is around 2min when an average prompt…
-
### 🐛 Describe the bug
I tried to use torch.mm compute block matrix multiplication severally instead of computing the result once , but I found the results of two computation are not close. For exa…
-
### 🐛 Describe the bug
The following snippet will work while `use_grad` is true, but will crash once the EmbeddingBag has its weights frozen.
```python
import torch
import torch.nn.functional a…