Closed happinesslz closed 4 years ago
Could you check if the process is still running on the CPU side? In the current training code, the 1st epoch is slower as the dataloader loads the data the first time. It might take several minutes before the training on the GPU start.
Also, the training on train split takes around 9GB memory space, if the system runs out of memory, it might be swapping the memory to disk, which might slow down things a lot.
Thanks for your nice code. I use tensorflow-gpu==1.15.0 to train your code on single Titan Xp GPU, but the GPU is occupied on 0% and the memory usage is only 158M? In addition, on GTX 1070, I also meet the same problem. Can you explain why? Thanks.
I follow your README and run the following command (for single gpu, and I also reduce the batch size and NUM_GPU):
CUDA_VISIBLE_DEVICES=7 python train.py configs/car_auto_T3_train_train_config configs/car_auto_T2_train_config