Open YanCote opened 3 years ago
Task are killed randomly on CC Graham cluster, just opened a ticker to get support, no way to understand what's going on
This was finally fixed. Lesson learned: tf.Dataset.Generator is not compatible with TF v1. We had to replace the generator with other map operations to load images and do the other data processing beforehand.
So we are able to run TF1 script on GPU on CC's graham. We still need to validate running with Multiple GPU and the optimal place to store our dataset
Execute one Task on a Cluster (CC / MILA)
Find where to save the Datasets