Closed ghost closed 5 years ago
Hi @Vaibhavjolly,
I just ran the Dex-Net 2.0 training a few times again and was not able to reproduce this error. Could you provide:
1) The exact command you are running. 2) The epoch/step at which you are seeing this. (I'm interested in how early-on it is happening.)
Thanks, Vishal
Hi @visatish
I followed your documentation : Training from scratch (Dexnet 2)
I downloaded the dataset by running : ./scripts/downloads/datasets/download_dex-net_2.0.sh
And then executed : ./scripts/training/train_dex-net_2.0.sh
It has not run for any epoch.
This is the exact log :
(py27) dl@dl-machine:~/gqcnn$ ./scripts/training/train_dex-net_2.0.sh
WARNING:root:Failed to import geometry msgs in rigid_transformations.py.
WARNING:root:Failed to import ros dependencies in rigid_transforms.py
WARNING:root:autolab_core not installed as catkin package, RigidTransform ros me thods will be unavailable
root WARNING autolab_perception is not installed as a catkin package - RO S msg conversions will not be available for image wrappers
root WARNING autolab_perception is not installed as a catkin package - RO S msg conversions will not be available for image wrappers
root WARNING Unable to import pylibfreenect2. Python-only Kinect driver m ay not work properly.
root WARNING Unable to import openni2 driver. Python-only Primesense driv er may not work properly
root WARNING Failed to import ROS in primesense_sensor.py. ROS functional ity not available
root WARNING primesense_sensor.py not installed as catkin package. ROS fu nctionality not available.
root WARNING Failed to import ROS in ensenso_sensor.py. ROS functionality not available
trimesh WARNING No FCL -- collision checking will not work
OpenGL.acceleratesupport INFO OpenGL_accelerate module loaded
OpenGL.arrays.arraydatatype INFO Using accelerated ArrayDatatype
GQCNNModelFactory INFO Initializing GQ-CNN with Tensorflow as backend...
root INFO Root logger now logging to /home/dl/gqcnn/tools/../models/GQ CNN-2.0/training.log
GQCNNTrainerTF INFO Saving model to: /home/dl/gqcnn/tools/../models/GQCNN-2. 0
GQCNNTrainerTF INFO Training split: image_wise found in dataset.
GQCNNTrainerTF INFO Percent positive in train: 0.1920775838367626
GQCNNTrainerTF INFO Percent positive in val: 0.19251506572445515
GQCNNTF INFO Initializing TF Session...
2019-04-13 17:26:58.957482: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-04-13 17:26:59.598018: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5639b82689b0 executing computations on platform CUDA. Devices:
2019-04-13 17:26:59.598106: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-04-13 17:26:59.598133: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-04-13 17:26:59.598161: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (2): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-04-13 17:26:59.598186: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (3): GeForce GTX 1080 Ti, Compute Capability 6.1
2019-04-13 17:26:59.607981: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200005000 Hz
2019-04-13 17:26:59.610983: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5639b8611bd0 executing computations on platform Host. Devices:
2019-04-13 17:26:59.611056: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): rate
instead of keep_prob
. Rate should be set to rate = 1 - keep_prob
.
GQCNNTF INFO Building Pose Stream...
GQCNNTF INFO Building Fully Connected Pose Layer: pc1...
GQCNNTF INFO Reinitializing layer pc1
GQCNNTF INFO Building Merge Stream...
GQCNNTF INFO Building Merge Layer: fc4...
GQCNNTF INFO Reinitializing layer fc4.
GQCNNTF INFO Building fully connected layer: fc5...
GQCNNTF INFO Reinitializing layer fc5.
GQCNNTF INFO Building Softmax Layer...
GQCNNTrainerTF INFO Beginning Optimization...
Process Process-1:
Traceback (most recent call last):
File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, self._kwargs)
File "build/bdist.linux-x86_64/egg/gqcnn/training/tf/trainer_tf.py", line 1216, in _load_and_enqueue
train_poses[start_i:end_i,:] = train_poses_arr.copy()
ValueError: could not broadcast input array from shape (64,7) into shape (64,1)
Process Process-2:
Traceback (most recent call last):
File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, *self._kwargs)
File "build/bdist.linux-x86_64/egg/gqcnn/training/tf/trainer_tf.py", line 1216, in _load_and_enqueue
train_poses[start_i:end_i,:] = train_poses_arr.copy()
ValueError: could not broadcast input array from shape (64,7) into shape (64,1)
Process Process-3:
Traceback (most recent call last):
File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(self._args, self._kwargs)
File "build/bdist.linux-x86_64/egg/gqcnn/training/tf/trainer_tf.py", line 1216, in _load_and_enqueue
train_poses[start_i:end_i,:] = train_poses_arr.copy()
ValueError: could not broadcast input array from shape (64,7) into shape (64,1)
Hmm...this is weird. Sorry, but could you try/except
around that line and print the following:
1) self.gripper_mode
2) self.pose_mean.shape
and self.pose_std.shape
(in case there is some weird broadcasting going on afterwards)
Essentially, this line is supposed to slice out the corresponding part of the saved pose tensor (dim 7) for training (should be dim 1 for the 'legacy_parallel_jaw' gripper mode).
Hi @visatish I get the following output after printing those : self.gripper_mode - legacy_parallel_jaw self.pose_mean.shape- (7,) self.pose_std.shape- (7,)
I want to train the model from scratch,but stuck here ! If I am commenting out that 1216 : train_poses[start_i:end_i,:] = train_poses_arr.copy() my training starts ! But shouldn't comment and train right ? And may I know which version of Tensorflow you are using ?
Hi @Vaibhavjolly,
When I run things on my end, both shapes are (1,)
. Do you have the latest version of the master branch? I am using Tensorflow 1.13.1, but I don't think that has anything to do with this.
To give you some visibility on what's going on, at this point, the pose should be converted from (7,)
to (1,)
. This is done through read_pose_data
, which is located here. It would be a good first step to make sure that this is actually called.
One thing I would try is deleting the model dir before you train again, which should be models/GQCNN-2.0
if you are using the provided shell script. The reason for this is that if the training script finds a cached pose mean/std already in there, it will try to use it. Now this shouldn't be a problem, but I just want to make sure we start with a clean slate.
In the meanwhile, I will continue to try to replicate the problem on my end.
Thanks, Vishal
Hi @visatish , Yeah,It started training .Problem was this only :- I deleted the pretrained model GQCNN -2.0 directory and after that I executed the training script . It started training . Thanks a lot, Vaibhav
Hi @Vaibhavjolly,
That's great to hear! Just to confirm, you uncommented this line: train_poses[start_i:end_i,:] = train_poses_arr.copy()
, right?
Thanks, Vishal
hi @visatish , Yeah I uncommented that ! Actually,I am trying to understand the code,if you have any docs for that ,can u pls share or any suggestions ? Thanks ! Vaibhav
Hi @Vaibhavjolly,
You can refer to the API docs here. This will give you a high-level overview of how to use the various classes.
Thanks, Vishal
Thanks! @visatish
I also meet the same error when i use the dex-net2.0. i just evaluate the pre-trained GQ-CNN model. $ ./scripts/policies/run_all_dex-net_2.0_examples.sh
Hi @visatish , I am trying to train Gqcnn model with Dexnet 2.0 from scratch . It starts training but after few moments throwing an error as follows : Process Process-1: Process Process-3: Traceback (most recent call last): File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, self._kwargs) File "build/bdist.linux-x86_64/egg/gqcnn/training/tf/trainer_tf.py", line 1216, in _load_and_enqueue train_poses[start_i:end_i,:] = train_poses_arr.copy() ValueError: could not broadcast input array from shape (64,7) into shape (64,1) Traceback (most recent call last): File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, *self._kwargs) File "build/bdist.linux-x86_64/egg/gqcnn/training/tf/trainer_tf.py", line 1216, in _load_and_enqueue train_poses[start_i:end_i,:] = train_poses_arr.copy() ValueError: could not broadcast input array from shape (64,7) into shape (64,1) Process Process-2: Traceback (most recent call last): File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/home/dl/miniconda3/envs/py27/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(self._args, self._kwargs) File "build/bdist.linux-x86_64/egg/gqcnn/training/tf/trainer_tf.py", line 1216, in _load_and_enqueue train_poses[start_i:end_i,:] = train_poses_arr.copy() ValueError: could not broadcast input array from shape (64,7) into shape (64,1)
Thanks in advance ! Vaibhav