-
On a system with P100 GPUs, with the following Spack environment (where I explicitely set `cuda_arch=60` for some packages):
```yaml
spack:
config:
install_tree: /my-spack/spack
build…
-
Release test **long_running_horovod_tune_test.aws** failed. See https://buildkite.com/ray-project/release-tests-branch/builds/2075#018a34b7-dcc4-4090-9e85-1b201acedc8a for more details.
Managed by OS…
-
Now that we use `DataLoader` (v1) again (c0ac991b0363ad285269070740e9635b6ce478c8, fixed https://github.com/rwth-i6/returnn/issues/1382), we can directly use the `num_workers` option.
In 5b569b3ef5…
-
During the fine-tuning phase, due to the lack of multiple GPUs in our lab, I have commented out all Horovod-related code in the distributed training code. Under these circumstances, when running ./dem…
-
**System information**
* OS Platform and Distribution: Linux Ubuntu 16.04
* TensorFlow version == 2.8.3 and was installed through pip
* TensorFlow-Recommenders-Addons version : '0.6.0-dev' and was …
-
Hi, I am encountering the following errors recently when I try to finetune using the provided pretrained models.
1. When I cloned the original repo and tried finetuning on CORD per the instructi…
-
I was attempting to use distributed tensorflow when I noticed I could not add the 10th gpu on my node to a distributed strategy... After running nccl-tests, I noticed it appears to be an issue with NC…
-
**Environment:**
1. Framework: TensorFlow
2. Framework version: 1.15.0
3. Horovod version:0.19.5
4. MPI version:4.0.0
5. CUDA version:10.0
6. Python version:3.6.8
**Bug report:**
When I wa…
-
## Describe the bug
It is not exactly a bug, but I am wondering if it is possible to install torchrl with a pytorch version 2.1.0+cu121
## To Reproduce
I have an specific environment which I ca…
-
## Issue
With TF 2.11, the **mixed precision** and **model-checkpoint callback** don't work properly in combined.
```python
import tensorflow as tf
from tensorflow import keras
keras.mixed_…