problem when running train.py

dddson commented 6 years ago

Hey, Im having this error when i run train.py, can u help me?

WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) WARNING:tensorflow:tf.op_scope(values, name, default_name) is deprecated, use tf.name_scope(name, default_name, values) Traceback (most recent call last): File "C:\Python36\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 686, in _call_cpp_shape_fn_impl input_tensors_as_shapes, status) File "C:\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Negative dimension size caused by subtracting 3 from 1 for 'InceptionV3/InceptionV3/Conv2d_2a_3x3/Conv2D' (op: 'Conv2D') with input shapes: [227,1,1,32], [3,3,32,32].

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\davidson\Desktop\face\train.py", line 202, in tf.app.run() File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "C:\Users\davidson\Desktop\face\train.py", line 130, in main logits = model_fn(md['nlabels'], images, 1-FLAGS.pdrop, True) File "C:\Users\davidson\Desktop\face\model.py", line 89, in inception_v3 net, end_points = inception_v3_base(images, scope=scope) File "C:\Python36\lib\site-packages\tensorflow\contrib\slim\python\slim\nets\inception_v3.py", line 117, in inception_v3_base net = layers.conv2d(net, depth(32), [3, 3], scope=end_point) File "C:\Python36\lib\site-packages\tensorflow\contrib\framework\python\ops\arg_scope.py", line 181, in func_with_args return func(*args, current_args) File "C:\Python36\lib\site-packages\tensorflow\contrib\layers\python\layers\layers.py", line 1033, in convolution outputs = layer.apply(inputs) File "C:\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 671, in apply return self.call(inputs, *args, *kwargs) File "C:\Python36\lib\site-packages\tensorflow\python\layers\base.py", line 575, in call outputs = self.call(inputs, args, kwargs) File "C:\Python36\lib\site-packages\tensorflow\python\layers\convolutional.py", line 167, in call outputs = self._convolution_op(inputs, self.kernel) File "C:\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 835, in call return self.conv_op(inp, filter) File "C:\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 499, in call return self.call(inp, filter) File "C:\Python36\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 187, in call name=self.name) File "C:\Python36\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 630, in conv2d data_format=data_format, name=name) File "C:\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2958, in create_op set_shapes_for_outputs(ret) File "C:\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2209, in set_shapes_for_outputs shapes = shape_func(op) File "C:\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 2159, in call_with_requiring return call_cpp_shape_fn(op, require_shape_fn=True) File "C:\Python36\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 627, in call_cpp_shape_fn require_shape_fn) File "C:\Python36\lib\site-packages\tensorflow\python\framework\common_shapes.py", line 691, in _call_cpp_shape_fn_impl raise ValueError(err.message) ValueError: Negative dimension size caused by subtracting 3 from 1 for 'InceptionV3/InceptionV3/Conv2d_2a_3x3/Conv2D' (op: 'Conv2D') with input shapes: [227,1,1,32], [3,3,32,32].

Thank you! @dpressel

dpressel commented 6 years ago

I am not sure how you are using it, but its working for me as documented in the README.md (I am running TF 1.5):

python train.py --train_dir $ python train.py --train_dir ~/dev/work/AgeGenderDeepLearning/Folds/tf/age_test_fold_is_0 --max_steps 15000 --model_type inception --batch_size 32 --eta 0.001 --dropout 0.5 --pre_model /data/pre-trained/inception_v3.ckpt --max_steps 15000 --model_type inception --batch_size 32 --eta 0.001 --dropout 0.5 --pre_model /data/pre-trained/inception_v3.ckpt

Did you run preproc on your images as documented?

dddson commented 6 years ago

I downgraded to your version of TF just now, but it's still showing the same error. I basically adapted my convert_to_tf to have the same output as you but im using IMDB and WIKI instead of Adience.

this is my json file for age: {"num_valid_shards": 4, "num_train_shards": 20, "valid_counts": 116, "train_counts": 115456, "timestamp": "2018-05-02 03:15:49.088705", "nlabels": 100}

and this is the record output witch is equal to yours: features { feature { key: "image/class/label" value { int64_list { value: 54 } } } feature { key: "image/encoded" value { bytes_list { value: "\377\330\377\340\000\020.................\273r\305\247~U\265\225\357\177\231\377\331" } } } feature { key: "image/filename" value { bytes_list { value: "1839578_1955-12-16_2010.jpg" } } } feature { key: "image/height" value { int64_list { value: 256 } } } feature { key: "image/width" value { int64_list { value: 256 } } } }

dpressel commented 6 years ago

At first glance, it looks like the bands from your exporter might be messed up (ie not in the order the trainer expects), but I can try and replicate this. It might take me a while as I am very busy, but it seems like it should not be too hard. LMK if there are any details I will need to recreate

mingrongchen commented 5 years ago

I also encountered the same problem, please tell me how to solve it.

dpressel commented 5 years ago

This appears to be an issue with the dataset. It looks like by the time it hits the trainer it has only a single channel but the trainer is expected 3-band data. Please check the input data carefully and make sure you are passing the right thing.

Also I’m open to merging support for this dataset if somebody gets it running and sends a PR

dpressel / rude-carnie

problem when running train.py #78