-
@vpax writes [on Slack](https://zeekorg.slack.com/archives/CTGDG83B8/p1663345124790709):
> For a fresh checkout of master under MacOS Monterey, when building I get a warning:
>
> Linking CXX sha…
-
https://github.com/horovod/horovod/blob/master/docs/elastic.rst
It will be better if we support elastic training.
-
hi all, I am running TensorFlow benchmarks inside the horovod-docker to evaluate the models in distributed mode. I have installed Mellanox driver and GPUDirect RDMA API, and loaded the GPUDirect kern…
-
Hello,
I wanted to try out your code and came across an issue regarding pytorch dependencies.
I installed all the requirements in a fresh conda environment with `python 3.7.11` via your `require…
-
Hi, I recently want to reproduce your result and can get the metric your described in paper but I got a problems that the training (almost 3 days) than you described in paper (less than 12 hours).
…
-
Hi team I have a example based on the latest nv image nvcr.io/nvidia/tensorflow:24.07-tf2-py3 but run the mpi job on different nodes. However it complains that the launcher could not identify the work…
-
I observed that in model.py "gin_channels" is provided in DiffusionGenerator.
I would like to know if Grad-TTS supports multispeaker TTS training ?
Can you also provide pretrained model trained…
-
**Environment:**
1. Framework: TensorFlow
2. Framework version: 2.12.0
3. Horovod version: horovod-0.27.0
4. MPI version: openmpi-4.1.5
5. CUDA version: 11.8
6. NCCL version:
![image](https://…
-
suspect: this probably the optimizer issue, the optimizers like adam and others, they store the first order and second order momentum, this would be messed up the process?
Also,
if we prin…
-
When I ran tf_cnn_benchmarks with and without horovod, I got different evaluation sequences.
Without horovod:
python tf_cnn_benchmarks.py --data_dir ${HOME}/mldl/data/imagenet --model resnet50 --b…