-
Hello, and thank you for the excellent work!
In the paper it says:
> The first stage takes about 50 hours on a single 4x NVIDIA A100 machine (global batch size 128 with gradient
accumulation). …
-
### Your current environment
The output of `python collect_env.py`
2024-10-25 10:53:08.913038: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different…
-
### What is the version?
3.3.0 and higher
### What happened?
We're seeing segfaults in our EKS environment (running IPv6 clusters) when running dcgm-exporter 3.3.0 and higher (DCGM 3.3.3+) - we do …
-
Traceback (most recent call last):
File "E:\ComfyUI_3D\python_embeded\Lib\site-packages\torch\utils\cpp_extension.py", line 2105, in _run_ninja_build
subprocess.run(
File "subprocess.py", l…
-
### 🐛 Describe the bug
When trying to `torch.compile` a module that contains `torch.clear_autocast_cache` we get the attached error. I believe this is expected but wondering if there is an establishe…
-
Hi there,
I'm using Ubuntu22.04.3 LTS on wsl2, here is some problem when I follow the guides in readme:
```
(eureka) logan@DESKTOP-0TD40GD:~$ nvidia-smi
Mon Aug 19 17:05:07 2024
+-----…
-
## Motivation
Right now we have a single accelerator profile in RHOAI which doesn't give us the granularity to distinguish between different GPU (eg. A100 vs. V100) in the RHOAI dashboard.
## Completi…
-
### NVIDIA Open GPU Kernel Modules Version
555.42.02
### Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific…
-
### Windows Version
Windows 10 [19045.2728]
### WSL Version
1.1.6.0
### Are you using WSL 1 or WSL 2?
- [X] WSL 2
- [ ] WSL 1
### Kernel Version
Linux version 5.15.90.1-microsoft-standard-WSL2
…
-
**Describe the bug**
We are running our model in CPU is working fine when try to run in GPU using EMGU.CV.Cuda support getting unhandled exception.
** OS / Platform **
e.g. Ubuntu 22.04
** .Ne…