MichalBusta / E2E-MLT

E2E-MLT - an Unconstrained End-to-End Method for Multi-Language Scene Text
MIT License
291 stars 84 forks source link

the problem about process_boxes #70

Open duxiangcheng opened 4 years ago

duxiangcheng commented 4 years ago

Hi, thanks for sharing your amazing code! I have some question, can you help me?

  1. I don't know the function of process_boxes in train.py. if step > 10000 or True: #this is just extra augumentation step ... in early stage just slows down training ctcl, gt_b_good, gt_b_all = process_boxes(images, im_data, seg_pred[0], roi_pred[0], angle_pred[0], score_maps, gt_idxs, gtso, lbso, features, net, ctc_loss, opts, debug=opts.debug) ctc_loss_val += ctcl.data.cpu().numpy()[0] loss = loss + ctcl gt_all += gt_b_all good_all += gt_b_good

  2. as shown in the above code, the ctc_loss is validation loss. But I notice that the loss will backward. As I know, the validation loss should not operate backward(). So can you explain it?

thanks!

MichalBusta commented 4 years ago

Hi,

function just feeds actual proposals from detector to OCR module.

    • we do not use any validation loss - the strategy for taking model is: run several checkpoints on your validation dataset and pick the model with best end-to-end score.

Hope it helps, Michal

duxiangcheng commented 4 years ago

Thank you for your reply. And, I used train_ocr.py to pre-train the ocr net, but the CTC loss is unstable. The loss curve is so strange! 捕获

MichalBusta commented 4 years ago

What is your batch size? My guess it too small - try something > 64

AniketGurav commented 2 years ago

Hi, in function process_boxes net.forward_ocr is called 3 times. I am not clear about it. those lines no are 270,276,381 in train.py

By reading paper, what I understand is the function process_boxes ocr the crops extracted by the Localization Module LM. Those crops are extracted from the 1. bounding box coordinate extracted by LM and 2.feature map from one of the layer of LM.

But I am not clear about 3rd ocr call on line 381 above..

I have referred Fig 3 of your paper https://arxiv.org/pdf/1801.09919.pdf for understanding.