Closed harm-devries closed 1 year ago
For the model training it's important that we achieve a high throughput. We need to figure out the multi-node training configuration(i.e., what combination of data, tensor, and pipeline parallelism) for 192 V100 GPUs.
For the model training it's important that we achieve a high throughput. We need to figure out the multi-node training configuration(i.e., what combination of data, tensor, and pipeline parallelism) for 192 V100 GPUs.