Training Pipeline takes ~2 Hours to run the training script for a sample space, for eg: Fort Collins region.
We are creating a CPU Cluster & GPU Cluster with 4 nodes each. From the experiment logs, it looks like we are using a single GPU node for training. Is it because of CPU bottleneck or we don't have distributed code in place?
Training Pipeline takes ~2 Hours to run the training script for a sample space, for eg: Fort Collins region.
We are creating a CPU Cluster & GPU Cluster with 4 nodes each. From the experiment logs, it looks like we are using a single GPU node for training. Is it because of CPU bottleneck or we don't have distributed code in place?
cc/ @geohacker