Open mengdong opened 5 years ago
@mengdong Thanks for rising this issue.
We routinely work with ResNet and other complicated models so I don't think that the complexity is the issue.
Are you using the encoding where channels is in dim=3
?
Could you recover this behavior with a single ResNet unit and print the entire trace?
Hello @eladeban,
Thanks for the prompt response. I don't think complexity is the issue, as lenet also have similar error. I suspect the additional node/ops created by tensorflow estimator interface. I will try with tensorflow slim to see how it works seems like you more success on tensorflow slim.
took another look. it might be reduce_mean
in line 542.
can you apply the regularizer to take inputs
prior to that?
qq: are you using channels_first
?
Sorry for the late reply, yes, I am using channels_first. Let me modify the regularizer and give it a try
Hello, thank you for looking into the code. I have tried to modify the output_boundary
to:
name: "resnet_model/block_layer4"
op: "Identity"
input: "resnet_model/Relu_48"
device: "/replica:0/task:0/device:GPU:0"
attr {
key: "T"
value {
type: DT_FLOAT
}
}
The entire trace is here:
I0910 17:29:56.384171 139780389660480 op_regularizer_manager.py:122] OpRegularizerManager starting analysis from: [<tf.Operation 'resnet_model/block_layer4' type=Identity>].
I0910 17:29:56.385807 139780389660480 op_regularizer_manager.py:125] OpRegularizerManager found 618 ops and 53 sources.
Traceback (most recent call last):
File "imagenet_main.py", line 391, in <module>
absl_app.run(main)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "imagenet_main.py", line 385, in main
run_imagenet(flags.FLAGS)
File "imagenet_main.py", line 378, in run_imagenet
shape=[DEFAULT_IMAGE_SIZE, DEFAULT_IMAGE_SIZE, NUM_CHANNELS])
File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/examples/resnet/resnet_run_loop.py", line 705, in resnet_main
max_steps=flags_obj.max_train_steps)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 367, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1156, in _train_model
return self._train_model_distributed(input_fn, hooks, saving_listeners)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1219, in _train_model_distributed
self._config._train_distribute, input_fn, hooks, saving_listeners)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1299, in _actual_train_model_distributed
self.config))
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1810, in call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_core/python/distribute/one_device_strategy.py", line 356, in _call_for_each_replica
return fn(*args, **kwargs)
File "/home/dongm/python-virtual-env/tftot/lib/python3.6/site-packages/tensorflow_estimator/python/estimator/estimator.py", line 1146, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "imagenet_main.py", line 347, in imagenet_model_fn
label_smoothing=flags.FLAGS.label_smoothing
File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/examples/resnet/resnet_run_loop.py", line 398, in resnet_model_fn
gamma_threshold=1e-3
File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/network_regularizers/flop_regularizer.py", line 72, in __init__
regularizer_blacklist=regularizer_blacklist)
File "/home/dongm/workspace/laptop_mapping/morph-net/morph_net/framework/op_regularizer_manager.py", line 137, in __init__
['%s (%s)' % (o.name, o.type) for o in self._op_deque])
RuntimeError: OpRegularizerManager could not handle ops: ['resnet_model/batch_normalization_31/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_36/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_35/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_34/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_39/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_38/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_37/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_42/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_41/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_40/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_46/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_45/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_6 (Pad)', 'resnet_model/batch_normalization_44/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_49/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_48/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_47/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_52/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_51/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_50/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_43/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_24/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_11/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_1/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad (Pad)', 'resnet_model/batch_normalization/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_4/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_3/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/max_pooling2d/MaxPool (MaxPool)', 'resnet_model/initial_max_pool (Identity)', 'resnet_model/batch_normalization_2/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_7/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_6/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_5/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_10/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_9/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_8/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_14/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_13/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_2 (Pad)', 'resnet_model/batch_normalization_12/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_17/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_16/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_15/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_20/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_19/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_18/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_23/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_22/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_21/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_27/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_26/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/Pad_4 (Pad)', 'resnet_model/batch_normalization_25/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_30/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_29/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_28/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_33/FusedBatchNormV3 (FusedBatchNormV3)', 'resnet_model/batch_normalization_32/FusedBatchNormV3 (FusedBatchNormV3)']
channels_first
is the problem. We assume channels_last
...
Note that you need to use channels_last
only during structure leanring, later you could revert back to (faster?) channel_first
.
I see. Let try this again. Thanks for clarifying.
Hello,
I have tried a few examples from tensorflow/model with morphnet (lenet and resnet), a simple mnist model (https://github.com/mengdong/morph-net/blob/master/morph_net/examples/mnist/mnist-tutorial.py) works. However, I ran into problems in some other more complex models under tensorflow estimator interface.
I wonder is there a recommended way to use morphnet in tf estimator inferface? I know there is quite some overhead in the estimator's graph. Detailed infromation below:
Regarding lenet (https://github.com/mengdong/morph-net/blob/master/morph_net/examples/mnist/mnist.py) from https://github.com/tensorflow/models/tree/master/official/mnist, I observe that:
Regarding ResNet (https://github.com/mengdong/morph-net/blob/master/morph_net/examples/resnet/imagenet_main.py), I observe: