fcnn validation loss for text detection doesnt converge

FabricioProjects commented 6 years ago

Hey all, as you see from the tensorboard below, the val_loss_rpn_cls and val_loss_class_cls doesnt converge. Im using just a text class and tried different learning rates, VGG16, resnet50 and ROIs.

resnet50

any suggestion on this problem?

Abhijit-2592 commented 6 years ago

@FabricioProjects Looking at your plots it's overfitting badly. But it's hard to say anything without knowing about the data. Did you initialize using imagenet weights?

FabricioProjects commented 6 years ago

@Abhijit-2592 , maybe is because Im using just the 'text' and the default "bg" class? the problem occurs only in the classifier losses.

Abhijit-2592 commented 6 years ago

Text dataset as in like MNIST? or something else? I am unable to speculate anything. Yes I saw that your classification loss is increasing. Am really surprised that RPN classification loss is increasing.It's very common for the final classification layer's classification loss to increase after some time of training But in my experience I haven't seen RPN's loss to increase. Could you give more info on your dataset if it isn't confidential. Maybe a few sample images and what you are trying to do?

FabricioProjects commented 6 years ago

Im using the ICDAR and BORN_DIGITAL datasets with 800 images to train and 120 to validate. Around 7000 bboxes. The classes are always 'text'. Now Im trying to add some negative samples like a white image with bboxes 0,0,0,0 but the first results are not very promise. The tests results of the above tensorboard scalars gives a mean IOU of 0.73. Below are some dataset samples. The resolution between ICDAR and BORN_DIGITAL images can differ significantly

Abhijit-2592 commented 6 years ago

Look at this closed issue on this repo. Your dataset looks similar to this I guess.

https://github.com/Abhijit-2592/Keras_object_detection/issues/1

FabricioProjects commented 6 years ago

Ty for the link but still are a multiclass dataset. Are you working on implementing custom batch size? As far i know the training is only batch_size = 1 and i have problems with that because the constant backpropagation through large images. This means a very slow training

Abhijit-2592 commented 6 years ago

@FabricioProjects I am not working on implementing custom batch size because, Faster RCNN is a resource intense architecture. Even Tensorflow developers recommend using batch-size of one for training large images in Faster-RCNN unless you have access to multiple parallel GPUs (Which I don't).

The training time bottle neck is in the data-generation step. Training time can be increased 3x (at the very least) by queuing the data. The data pre-processing step happens in the CPU and then it is fed into keras layers which runs on GPU. I am not able to find time to implement data queuing. But feel free to fork and contribute.

Regarding your training:

What are your image sizes and what size you are resizing them to? This is a crucial part. Try different size anchor boxes, make sure your anchor sizes cover all the ground truth bbox sizes. Have a test-set separate from your validation set and measure mAP on it. This should give a good info on the accuracy of your model. I think you are missing a small point somewhere because, from the look of your data I feel Faster-RCNN shouldn't have any problem learning it.

PS: Try the VGG-16 meta architecture first because since the architecture is straight forward it is easier to debug.
Happy training.

FabricioProjects commented 6 years ago

Ty for your response @Abhijit-2592 , I will take a look in queuing the data. 'What are your image sizes and what size you are resizing them to? ' now Im training law texts images fixed on width 2479 and height 3508 pixels. In the algorithm I put 'im_size=600' (dont know if is a good resize) 'Try different size anchor boxes' Yes, I'm aware of that. I'm measuring the results doing the mean IoU between predicts bbox (separate data) and ground truths. My best results until far is a mean IoU of 0.73 (I'm trying to get something close to 0.85 at least) 'Try the VGG-16 meta architecture first ' Until far I tried VGG-16 and Resnet50 with imagenet pre trained weights. The results are about the same.

Abhijit-2592 commented 6 years ago

@FabricioProjects by resizing to 600 you are bringing data to 1/5th. Try giving the max size your gpu allows without throwing resource exhaust error. If you are able to resize the min side to 2479 then use that. Should give u marginal improvement

Abhijit-2592 commented 6 years ago

@FabricioProjects I hope your problem is solved I am closing this issue now. Feel free to reopen other issues if any.

ant1pink commented 6 years ago

@Abhijit-2592 I had the same problem, the min side of the images in my datasets is much larger than 600. If I resize the min side to 600, the logo (ROI) that I am going to detect is becoming even smaller. As the result, the smallest anchor-size from the default setting would be too big for it. So as you suggested, I have changed im_size to 1000 and anchor-size to 32, and I got a better result. But I am not sure how to set num_rois, is it larger the better for my case?

ant1pink commented 6 years ago

@FabricioProjects What is the dropout rate you applied in your classifier?

Abhijit-2592 / Keras_object_detection

fcnn validation loss for text detection doesnt converge #5