OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 16.04
Python version:3.7
Installed using pip or ROS:pip
GPU model (if applicable): Geforce1080Ti
CPU 24G
@visatish
Describe the result you are trying to replicate
04-21 18:00:03 GQCNNTrainerTF INFO Step 51878 (epoch 1.234), 0.01 s
04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.379, learning rate: 0.0095
04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 8.594
04-21 18:00:03 GQCNNTrainerTF INFO Step took 0.556 sec.
04-21 18:00:03 GQCNNTrainerTF INFO Max 0.7454133
04-21 18:00:03 GQCNNTrainerTF INFO Min 0.0010097245
04-21 18:00:03 GQCNNTrainerTF INFO Pred nonzero 25
04-21 18:00:03 GQCNNTrainerTF INFO True nonzero 31
04-21 18:00:03 GQCNNTrainerTF INFO Step 51879 (epoch 1.234), 0.01 s
04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.509, learning rate: 0.0095
04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 28.125
04-21 18:00:03 GQCNNTrainerTF INFO Step took 0.054 sec.
04-21 18:00:03 GQCNNTrainerTF INFO Max 0.28531963
04-21 18:00:03 GQCNNTrainerTF INFO Min 1.0366358e-05
04-21 18:00:03 GQCNNTrainerTF INFO Pred nonzero 0
04-21 18:00:03 GQCNNTrainerTF INFO True nonzero 0
04-21 18:00:03 GQCNNTrainerTF INFO Step 51880 (epoch 1.234), 0.0 s
04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.179, learning rate: 0.0095
04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 0.0
04-21 18:00:04 GQCNNTrainerTF INFO Step took 0.101 sec.
04-21 18:00:04 GQCNNTrainerTF INFO Max 0.6219147
04-21 18:00:04 GQCNNTrainerTF INFO Min 2.1433334e-05
04-21 18:00:04 GQCNNTrainerTF INFO Pred nonzero 18
04-21 18:00:04 GQCNNTrainerTF INFO True nonzero 42
04-21 18:00:04 GQCNNTrainerTF INFO Step 51881 (epoch 1.234), 0.0 s
04-21 18:00:04 GQCNNTrainerTF INFO Minibatch loss: 0.503, learning rate: 0.0095
04-21 18:00:04 GQCNNTrainerTF INFO Minibatch error: 31.25
04-21 18:29:39 GQCNNTrainerTF INFO Cleaning and preparing to exit optimization...
04-21 18:29:41 GQCNNTrainerTF INFO Terminating prefetch queue workers...
04-21 18:29:49 GQCNNTrainerTF INFO Flushing prefetch queue...
Describe the unexpected behavior
As you can see in the log, the training shut down unexpectedly. I tried three times and this situation always happens. How can I fix this? Maybe it's something about CPU overflow, but I think my CPU is big enough.
System information
Describe the result you are trying to replicate
04-21 18:00:03 GQCNNTrainerTF INFO Step 51878 (epoch 1.234), 0.01 s 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.379, learning rate: 0.0095 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 8.594 04-21 18:00:03 GQCNNTrainerTF INFO Step took 0.556 sec. 04-21 18:00:03 GQCNNTrainerTF INFO Max 0.7454133 04-21 18:00:03 GQCNNTrainerTF INFO Min 0.0010097245 04-21 18:00:03 GQCNNTrainerTF INFO Pred nonzero 25 04-21 18:00:03 GQCNNTrainerTF INFO True nonzero 31 04-21 18:00:03 GQCNNTrainerTF INFO Step 51879 (epoch 1.234), 0.01 s 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.509, learning rate: 0.0095 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 28.125 04-21 18:00:03 GQCNNTrainerTF INFO Step took 0.054 sec. 04-21 18:00:03 GQCNNTrainerTF INFO Max 0.28531963 04-21 18:00:03 GQCNNTrainerTF INFO Min 1.0366358e-05 04-21 18:00:03 GQCNNTrainerTF INFO Pred nonzero 0 04-21 18:00:03 GQCNNTrainerTF INFO True nonzero 0 04-21 18:00:03 GQCNNTrainerTF INFO Step 51880 (epoch 1.234), 0.0 s 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.179, learning rate: 0.0095 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 0.0 04-21 18:00:04 GQCNNTrainerTF INFO Step took 0.101 sec. 04-21 18:00:04 GQCNNTrainerTF INFO Max 0.6219147 04-21 18:00:04 GQCNNTrainerTF INFO Min 2.1433334e-05 04-21 18:00:04 GQCNNTrainerTF INFO Pred nonzero 18 04-21 18:00:04 GQCNNTrainerTF INFO True nonzero 42 04-21 18:00:04 GQCNNTrainerTF INFO Step 51881 (epoch 1.234), 0.0 s 04-21 18:00:04 GQCNNTrainerTF INFO Minibatch loss: 0.503, learning rate: 0.0095 04-21 18:00:04 GQCNNTrainerTF INFO Minibatch error: 31.25 04-21 18:29:39 GQCNNTrainerTF INFO Cleaning and preparing to exit optimization... 04-21 18:29:41 GQCNNTrainerTF INFO Terminating prefetch queue workers... 04-21 18:29:49 GQCNNTrainerTF INFO Flushing prefetch queue...
Describe the unexpected behavior As you can see in the log, the training shut down unexpectedly. I tried three times and this situation always happens. How can I fix this? Maybe it's something about CPU overflow, but I think my CPU is big enough.