-
### 🐛 Describe the bug
The following code may failed:
```
import torch
from torch import nn
class A(nn.Module):
def __init__(self):
super().__init__()
self.p = nn.Par…
-
**Describe the bug**
I tried to view geopm reports on a GPU workload with annotated regions. Some GPU signals (e.g., GPU_POWER) were reporting `nan` in regions that were using those GPUs.
**GEOPM …
-
### 🐛 Describe the bug
Hi,
When I launched multi-process training (8x A100) using `torchvision.datasets.ImageNet()` with fresh-prepared `root` (i.e. containing only `ILSVRC2012_devkit_t12.tar.gz…
-
### What is the issue?
We're using service containers as a form of basic local up for our developers. This means that we've got a basic set of features available during either development or a loca…
-
### 🐛 Describe the bug
Working with LLMs, I got a strangely large CUDA OOM error. I was using torch.svd_lowrank, which again calls on torch._lowrank.get_approximate_basis. Below I paste the minimal…
-
### Your current environment
```text
_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow …
-
I saw an article about the speed of pthread_mutex implementations ( https://justine.lol/mutex/ )
and tried the mentioned test program (high contended scenario) with OpenWatcom.
The test program uses…
-
### How can we reproduce the crash?
Running two modules simultaneously in a turborepo app crashes with "out of memory" errors (although they are not so big modules).
Other times, errors like: `memor…
-
### 🐛 Describe the bug
When export using torch.jit.trace involves a custom autograd function (activation checkpointing), runtime error occors.
Code snippets (https://github.com/haoheliu/AudioLDM…
-
Not sure how this is possible, but the following code throws a segfault:
```julia
using Oceananigans
using Oceananigans.BoundaryConditions: fill_halo_regions!
partition = Partition(y=2)
arch …