Deactivate patch sampling

Cuky88 commented 6 years ago

When I comment the distorted bounding box crop, than the net will give me an error, due to nans and infs. I assume that I get invalid bounding box sizes (< 0 or > 1). But with the distorted bounding box crop everything works fine. I double checked my custom dataset many times. When creating tfrecords I resize every box to be in range [0,1] and also with your provided scripts for checking the tfrecords, I don't have any values for bboxes out of range.

Does someone has the same problems? How can I deactivate the distorted bounding box crop? Cross Entropy for Positives is not converging, localization converges very slowly. Deactivating color distortion helped already, but I need to deactivate the cropping!

Another problem I have with the cropping is, when I obtain the distorted bbox from the RGB image, I slice the RGB image and I also want to slice a grayscale image with the same distorted bbox, too. But sometimes I cannot slice the grayscale image with the distorted bbox I got from the RGB due to non matching tensor shapes (both images have the same sizes). Is this because RGB has 3 channels and the other just 1 channel?

Any help would be awesome.

LevinJ commented 6 years ago

Hi @Cuky88 , disabling bounding box crop should work fine. I remember I actually started training the model without any data augmentation. After the training can converge and we need to reduce variance, I added data augmentation back.

I think it should be as simple as below

 //dst_image, labels, bboxes, distort_bbox = \
            distorted_bounding_box_crop(dst_image, labels, bboxes)

    //    tf_summary_image(image, tf.reshape(distort_bbox, (1,-1)), 'cropped_position')

What kind of error mesages are you seeing when you comment the bounding box cropping?

Cuky88 commented 6 years ago

Hey, thanks for reply. This is the error I get when I comment the distorted bbox:

2018-03-19 09:10:26.637159: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\kernels\check_numerics_op.cc:177] abnormal_detected_host @000000020C39EF00 = {0, 1} LossTensor is inf or nan
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.InvalidArgumentError'>, LossTensor is inf or nan : Tensor had Inf values
         [[Node: train_op/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan", _device="/job:localhost/replica:0/task:0/device:GPU:0"](control_dependency)]]
         [[Node: Adam/update_UpProjection/layer16x_br1_BN/scale/ApplyAdam/_7560 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_11670_Adam/update_UpProjection/layer16x_br1_BN/scale/ApplyAdam", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'train_op/CheckNumerics', defined at:
  File "train_model.py", line 31, in <module>
    ssd_trainer.start_training()
  File "D:\Devel\CNN_Depth_Estimation\Multi2\modular_SSD_tensorflow\trainer\trainer.py", line 204, in start_training
    train_op = slim.learning.create_train_op(total_loss, optimizer, variables_to_train=variables_to_train, summarize_gradients=False)
  File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\slim\python\slim\learning.py", line 439, in create_train_op
    check_numerics=check_numerics)
  File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow\contrib\training\python\training\training.py", line 464, in create_train_op
    'LossTensor is inf or nan')
  File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 704, in check_numerics
    "CheckNumerics", tensor=tensor, message=message, name=name)
  File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 3160, in create_op
    op_def=op_def)
  File "C:\ProgramData\Miniconda3\envs\tensorflow\lib\site-packages\tensorflow\python\framework\ops.py", line 1625, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): LossTensor is inf or nan : Tensor had Inf values
         [[Node: train_op/CheckNumerics = CheckNumerics[T=DT_FLOAT, message="LossTensor is inf or nan", _device="/job:localhost/replica:0/task:0/device:GPU:0"](control_dependency)]]
         [[Node: Adam/update_UpProjection/layer16x_br1_BN/scale/ApplyAdam/_7560 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_11670_Adam/update_UpProjection/layer16x_br1_BN/scale/ApplyAdam", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I extended the architecture to estimate depths beside bounding boxes. Everything is working fine, when using the bbox distortion. I don't understand what the problem is.

I have to say that I have a lot of smaller bboxes on the horizon line. Does SSD filter them out when doing the default box matching, if they are under a certain size?

I also extended SSD with deconvolution modules, so it can handle smaller bboxes better, but the result is the same when deactivating distortion.

I tested with different learning rates, with 0.1, 0.01 ... 0.00001 but it's still the same error.

LevinJ commented 6 years ago

Hi @Cuky88, I am afraid I am not exactly sure what went wrong after checking the error messages you posted. Maybe you could try removing your code changes and only use original codes without image cropping to verify that everything works okay, and then gradually add back your changes.

Generally speaking, image cropping is quite simple in that it only crops out a portion of the original image and correspondingly adust relevant labels in the original image.

JackSparrow3 commented 6 years ago

I have the same error, do you figure this problem?

LevinJ / SSD_tensorflow_VOC

Deactivate patch sampling #19