-
Since the latest 1.17.x versions, containers with images considered "legacy" and that do not have the `NVIDIA_IMEX_CHANNELS` environment variable set fail to start with the following error:
```
Error…
-
The current pytorch 2.0.1 doesn't support H100 sm_90 driver, can you update torch to at least 2.1.
-
### Description
Reshape for sparse `BCOO` arrays fails if the target shape contains dimensions of size 1 and there is at least one dense dimension.
```python
from jax.experimental import sparse
…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
(base) [root@app2 ~]# docker run --gpus all -p 1080:8000 -v /app:/root/.cache/huggingface/ 784630b8bc0a
==========
== CUDA ==
==========
CUDA Version 12.2.2
Container image Copyright (c) 2…
-
The pyxis post install script is not installing Nvidia Container CLI in any case:
https://github.com/aws-samples/aws-parallelcluster-post-install-scripts/blob/main/pyxis/postinstall.sh#L45-L47
D…
-
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Is your issue described in the documentation?
- [X] I have read the documentation
### Is your issue present i…
-
### Is your feature request related to a problem? Please describe.
In terms of tracking system metrics from a profiler and mlflow perspective, the current code lacks some feature to better support ot…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch…
-
Hello,
I'm testing the default training configuration "combo_go2ARX5_pickle_reaching_extreme" and ran into some issues that I could use help with.
**Expected Training Outcome:** Without modifyin…