Pyramid level customization

mooratov commented 4 years ago

TLDR, This feature accomplishes the following:

allows user to pass custom list of pyramid levels via config file
these pyramid levels are then parsed in all appropriate places.
allows use of pyramid layer 2 (P2) for resnet backbone
one minor unrelated bug fix in debug.py ( made--show-annotations a separate flag from --annotations, which is already in use)

This PR would allow a user to use a custom set of pyramid levels, controlled by the config.ini file in a manner similar to how anchor_parameters are controlled now.

Our team has found that using a P2 pyramid layer enables us to utilize ultra-fine (e.g. 3 pixel by 3 pixel) bounding boxes that are otherwise out of reach with current configurable parameters. Some of the code used here was contributed by my team member Brad Nelson (https://github.com/bnelsj) in our private fork.

I tried to stick to stylistic conventions wherever I could, and it looked like utils/anchors.py already had a framework for configuring the pyramid_levels list. I simply implemented this more fully in the bin scripts. Unfortunately, there was a fair amount of re-writing the retinanet.py definition and making it somewhat uglier.

In order to make P2 work, I had to modify resnet.py so that it would return 4 output layers to use as backbone, while no other backbone type currently returns 4 output layers. Therefore, pyramid level 2 (P2) only works with resnet backbones as written. I tried to explain these changes with informative errors and comments.

The primary use of this feature would be to add P2, although a user can also exclude P7 and P6 if they desired. I did not allow users to exclude C3, C4, and C5.

Note that the length of pyramid_levels must be the same as the list of sizes and strides in anchor_parameters, although this is not explicitly enforced.

mooratov commented 4 years ago

Thank you so much for the detailed feedback! I believe I've made all the stylistic changes you requested.

To answer your question, yes it would be fantastic if all backbone models returned dicts with "C" keys and layers as values, and FPN function would return dict with "P" keys! I was thinking of doing something like this, but figured it would be an even bigger overhaul and would require me to touch ALL of the models, many of which I haven't even tested out before.

Are you asking me to make these changes as well prior to merge?

hgaiser commented 4 years ago

Are you asking me to make these changes as well prior to merge?

Yeah if you don't mind, I think it would be a good change.

mooratov commented 4 years ago

OK I updated all backbone models, retinanet FPN function, and retinanet initialization all working with dicts. I noticed that it was relatively easy to get densenet to work with P2 layers as well, so i enabled this capability.

The other backbones only return C3, C4, and C5, as previously designed. Note that of the remaining models, I was only able to successfully run vggnet (keras 2.3.1, tf 1.14) but i don't believe this is due to any changes that I've made.

mooratov commented 4 years ago

Thanks again for the feedback. I've made the requested changes, and have finally started using a linter on my editor, which helped me catch a few more stylistic lapses as well as a few genuine bugs. I've confirmed that train.py, evaluate.py, convert_model.py, and debug.py all work as intended now.

There are a few issues that can potentially arise however if users are not careful with configuring anchor parameters and pyramid levels. I am not sure to what extent you want me to foolproof this type of stuff, but I thought I would bring them to your attention at the very least:

pyramid_levels and anchor_params.sizes lists must have the same length or the code won't work. This can probably be fixed with an assert statement in both the retinanet and retinanet_bbox functions in retinanet.py to ensure they have the same length.
On a similar note, anchor_params.strides and anchor_params.sizes should have the same length ( this is an issue in the code independent of my PR ). Can be fixed with an assert in the config.py function that reads these anchor parameters (or elsewhere)
If you use evaluate.py on a model that was trained with non-standard pyramid levels, you MUST provide the config.ini file to set the pyramid levels properly, or else it will crash ungracefully due to tensor dimensionality mismatch. The crash occurs when eval.py runs model.predict_on_batch(..). This one would be more tricky for me to try fixing right now.

hgaiser commented 4 years ago

Good points, I think the asserts are relatively easy to add and can provide somewhat of a security against wrong configurations. The third point is indeed a bit more tricky, I would leave that as it is for now.

mooratov commented 4 years ago

ok all done!

hgaiser commented 4 years ago

Looks good to me, thanks again for your contribution :)

MishallSwain commented 4 years ago

Unable to train after this commit. Please help.

Traceback (most recent call last): File "train.py", line 540, in main() File "train.py", line 500, in main config=args.config File "train.py", line 114, in create_models model = model_with_weights(backbone_retinanet(num_classes, num_anchors=num_anchors, modifier=modifier), weights=weights, skip_mismatch=True) File "../../keras_retinanet/models/resnet.py", line 38, in retinanet return resnet_retinanet(*args, backbone=self.backbone, kwargs) File "../../keras_retinanet/models/resnet.py", line 99, in resnet_retinanet resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True) File "/home/ubuntu/.local/lib/python3.7/site-packages/keras_resnet/models/_2d.py", line 188, in ResNet50 return ResNet(inputs, blocks, numerical_names=numerical_names, block=keras_resnet.blocks.bottleneck_2d, include_top=include_top, classes=classes, *args, *kwargs) File "/home/ubuntu/.local/lib/python3.7/site-packages/keras_resnet/models/_2d.py", line 66, in ResNet x = keras_resnet.layers.BatchNormalization(axis=axis, epsilon=1e-5, freeze=freeze_bn, name="bn_conv1")(x) File "/anaconda/envs/tensorflow_p36/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in call outputs = call_fn(cast_inputs, args, kwargs) File "/anaconda/envs/tensorflow_p36/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 265, in wrapper raise e.ag_error_metadata.to_exception(e) TypeError: in user code:

/home/ubuntu/.local/lib/python3.7/site-packages/keras_resnet/layers/_batch_normalization.py:17 call  *
    return super(BatchNormalization, self).call(training=(not self.freeze), *args, **kwargs)

TypeError: type object got multiple values for keyword argument 'training'

mooratov commented 4 years ago

Hi, I believe that this is related to the Keras 2.4.0 release that happened a few days ago. It appears to break some fundamental compatibility (possibly because it forces installation of keras-resnet==0.2.0

pip uninstall keras pip install keras==2.3.0

fixed it for me. Not a satisfying fix, but also this is unrelated to the newly merged PR

On Tue, Jun 30, 2020 at 1:28 AM Mishall Swain notifications@github.com wrote:

Unable to train after this commit. Please help.

Traceback (most recent call last): File "train.py", line 540, in main() File "train.py", line 500, in main config=args.config File "train.py", line 114, in create_models model = model_with_weights(backbone_retinanet(num_classes, num_anchors=num_anchors, modifier=modifier), weights=weights, skip_mismatch=True) File "../../keras_retinanet/models/resnet.py", line 38, in retinanet return resnet_retinanet(*args, backbone=self.backbone, kwargs) File "../../keras_retinanet/models/resnet.py", line 99, in resnet_retinanet resnet = keras_resnet.models.ResNet50(inputs, include_top=False, freeze_bn=True) File "/home/conawarehouse/.local/lib/python3.7/site-packages/keras_resnet/models/_2d.py", line 188, in ResNet50 return ResNet(inputs, blocks, numerical_names=numerical_names, block=keras_resnet.blocks.bottleneck_2d, include_top=include_top, classes=classes, *args, *kwargs) File "/home/conawarehouse/.local/lib/python3.7/site-packages/keras_resnet/models/_2d.py", line 66, in ResNet x = keras_resnet.layers.BatchNormalization(axis=axis, epsilon=1e-5, freeze=freeze_bn, name="bn_conv1")(x) File "/anaconda/envs/tensorflow_p36/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 922, in call outputs = call_fn(cast_inputs, args, kwargs) File "/anaconda/envs/tensorflow_p36/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 265, in wrapper raise e.ag_error_metadata.to_exception(e) TypeError: in user code:

/home/conawarehouse/.local/lib/python3.7/site-packages/keras_resnet/layers/_batch_normalization.py:17 call return super(BatchNormalization, self).call(training=(not self.freeze), args, **kwargs)

TypeError: type object got multiple values for keyword argument 'training'

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fizyr/keras-retinanet/pull/1383#issuecomment-651640932, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJKFAWTICEGMOOWNUORQ2DRZGOZFANCNFSM4NSEOYIA .

fizyr / keras-retinanet

Pyramid level customization #1383