BerkeleyAutomation / gqcnn

Python module for GQ-CNN training and deployment with ROS integration.
https://berkeleyautomation.github.io/gqcnn
Other
306 stars 149 forks source link

Issue: Bug/Performance Issue [Replication] #115

Open JohnsonQi opened 4 years ago

JohnsonQi commented 4 years ago

System information

Describe the result you are trying to replicate

04-21 18:00:03 GQCNNTrainerTF INFO Step 51878 (epoch 1.234), 0.01 s 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.379, learning rate: 0.0095 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 8.594 04-21 18:00:03 GQCNNTrainerTF INFO Step took 0.556 sec. 04-21 18:00:03 GQCNNTrainerTF INFO Max 0.7454133 04-21 18:00:03 GQCNNTrainerTF INFO Min 0.0010097245 04-21 18:00:03 GQCNNTrainerTF INFO Pred nonzero 25 04-21 18:00:03 GQCNNTrainerTF INFO True nonzero 31 04-21 18:00:03 GQCNNTrainerTF INFO Step 51879 (epoch 1.234), 0.01 s 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.509, learning rate: 0.0095 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 28.125 04-21 18:00:03 GQCNNTrainerTF INFO Step took 0.054 sec. 04-21 18:00:03 GQCNNTrainerTF INFO Max 0.28531963 04-21 18:00:03 GQCNNTrainerTF INFO Min 1.0366358e-05 04-21 18:00:03 GQCNNTrainerTF INFO Pred nonzero 0 04-21 18:00:03 GQCNNTrainerTF INFO True nonzero 0 04-21 18:00:03 GQCNNTrainerTF INFO Step 51880 (epoch 1.234), 0.0 s 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch loss: 0.179, learning rate: 0.0095 04-21 18:00:03 GQCNNTrainerTF INFO Minibatch error: 0.0 04-21 18:00:04 GQCNNTrainerTF INFO Step took 0.101 sec. 04-21 18:00:04 GQCNNTrainerTF INFO Max 0.6219147 04-21 18:00:04 GQCNNTrainerTF INFO Min 2.1433334e-05 04-21 18:00:04 GQCNNTrainerTF INFO Pred nonzero 18 04-21 18:00:04 GQCNNTrainerTF INFO True nonzero 42 04-21 18:00:04 GQCNNTrainerTF INFO Step 51881 (epoch 1.234), 0.0 s 04-21 18:00:04 GQCNNTrainerTF INFO Minibatch loss: 0.503, learning rate: 0.0095 04-21 18:00:04 GQCNNTrainerTF INFO Minibatch error: 31.25 04-21 18:29:39 GQCNNTrainerTF INFO Cleaning and preparing to exit optimization... 04-21 18:29:41 GQCNNTrainerTF INFO Terminating prefetch queue workers... 04-21 18:29:49 GQCNNTrainerTF INFO Flushing prefetch queue...

Describe the unexpected behavior As you can see in the log, the training shut down unexpectedly. I tried three times and this situation always happens. How can I fix this? Maybe it's something about CPU overflow, but I think my CPU is big enough.

JohnsonQi commented 4 years ago

WechatIMG56

541587481539_ pic