Hi, I am trying to train the new model on the server.
The GPU model is Tesla V100, batch_size=8.
But an epoch takes about 30 minutes, GPU-util sometimes have very low values and have a tendency to train slower and slower as epochs increase. Is that normal?
Hi, I am trying to train the new model on the server. The GPU model is Tesla V100, batch_size=8. But an epoch takes about 30 minutes, GPU-util sometimes have very low values and have a tendency to train slower and slower as epochs increase. Is that normal?